Saturday, February 14, 2026

AI => WASM sort challenge

While AI models have concluded that Elon Musk is stretching reality with his proclamations that AI will generate binary "directly", technically that is already possible, while terribly inefficient.

DraganSr: Elon Musk: AI generates machine code directly? AlphaDev by Google DeepMind

Elon Musk thinks coding dies this year! - YouTube

So here is an quick experiment with all 3 best major AI coding models to see how can they solve a simple challenge in "one shot binary code generation".

Prompt

generate wasm function in linear form (not s-expressions) for insert-sort in this folder, that takes an array of integers array and sorts it.

also generate run.js for calling this with sample array of intergers and prints results.

create ps1 script to convert wat2wasm, and then wasm2wat in s-expressions to make it readable.

do this as one-shot attempt.

Result: all models solved the challenge, and each tool a few minutes (not fast)

Note: WAT is text representation of WASM, a binary executable format.
run.js is used to call binary code.

Does this make Elon's assessment right?

No.
But proves the concept that AI code generation in any language is real and useful.
Just need to pick right level of abstraction.
Binary is hard, both for humans and machines.
Engineering 101

Opus 4.6

;; insert_sort.wat — Insertion Sort in WebAssembly
;; Uses flat/linear (stack-machine) instruction style inside functions
;; The module wrapper uses S-expression syntax as required by WABT's wat2wasm

(module
  ;; Memory: 1 page (64KB), exported so JS can read/write
  (memory (export "memory") 1)

  ;; Function: sort(offset: i32, length: i32)
  ;; Performs in-place insertion sort on an i32 array in linear memory
  (func $sort (export "sort") (param $offset i32) (param $length i32)
    (local $i i32)
    (local $j i32)
    (local $key i32)
    (local $j_addr i32)
    (local $i_addr i32)

    ;; i_addr = offset + 4  (byte address of element[1])
    local.get $offset
    i32.const 4
    i32.add
    local.set $i_addr

    ;; i = 1
    i32.const 1
    local.set $i

    ;; outer loop: for (i = 1; i < length; i++)
    block $break_outer
      loop $loop_outer
        ;; if (i >= length) break
        local.get $i
        local.get $length
        i32.ge_s
        br_if $break_outer

        ;; key = memory[i_addr]
        local.get $i_addr
        i32.load
        local.set $key

        ;; j = i - 1
        local.get $i
        i32.const 1
        i32.sub
        local.set $j

        ;; j_addr = i_addr - 4
        local.get $i_addr
        i32.const 4
        i32.sub
        local.set $j_addr

        ;; inner loop: while (j >= 0 && memory[j_addr] > key)
        block $break_inner
          loop $loop_inner
            ;; if (j < 0) break
            local.get $j
            i32.const 0
            i32.lt_s
            br_if $break_inner

            ;; load memory[j_addr]; if (<= key) break
            local.get $j_addr
            i32.load
            local.get $key
            i32.le_s
            br_if $break_inner

            ;; memory[j_addr + 4] = memory[j_addr]  (shift element right)
            local.get $j_addr
            i32.const 4
            i32.add
            local.get $j_addr
            i32.load
            i32.store

            ;; j--
            local.get $j
            i32.const 1
            i32.sub
            local.set $j

            ;; j_addr -= 4
            local.get $j_addr
            i32.const 4
            i32.sub
            local.set $j_addr

            br $loop_inner
          end ;; end loop $loop_inner
        end ;; end block $break_inner

        ;; memory[j_addr + 4] = key  (insert key into correct position)
        local.get $j_addr
        i32.const 4
        i32.add
        local.get $key
        i32.store

        ;; i++
        local.get $i
        i32.const 1
        i32.add
        local.set $i

        ;; i_addr += 4
        local.get $i_addr
        i32.const 4
        i32.add
        local.set $i_addr

        br $loop_outer
      end ;; end loop $loop_outer
    end ;; end block $break_outer
  ) ;; end func $sort
) ;; end module

// run.js — Load insert_sort.wasm, sort an array of integers, print results
// const fs = require("fs");
// const path = require("path");
import fs from 'fs'
import path from 'path';

async function main() {
  const wasmPath = path.join("insert_sort.wasm");
  const wasmBuffer = fs.readFileSync(wasmPath);
  const { instance } = await WebAssembly.instantiate(wasmBuffer);

  const { memory, sort } = instance.exports;

  // Sample array of integers to sort
  const input = [42, 7, -3, 99, 0, 15, -8, 23, 1, 56, 33, -12, 8, 77, 4];

  console.log("Input array: ", input);
  console.log("Length:       ", input.length);

  // Write the array into WASM linear memory starting at byte offset 0
  const view = new Int32Array(memory.buffer, 0, input.length);
  view.set(input);

  // Call the sort function: sort(offset=0, length=input.length)
  sort(0, input.length);

  // Read back the sorted array
  const sorted = Array.from(new Int32Array(memory.buffer, 0, input.length));

  console.log("Sorted array:", sorted);

  // Verification
  const expected = [...input].sort((a, b) => a - b);
  const pass = JSON.stringify(sorted) === JSON.stringify(expected);
  console.log("Verification:", pass ? "✅ PASS" : "❌ FAIL");
  if (!pass) {
    console.log("Expected:    ", expected);
  }
}

main().catch((err) => {
  console.error("Error:", err);
  process.exit(1);
});

Gemini 3.0 Pro

(module
  ;; Export memory so JS can read/write the array
  (memory (export "mem") 1)

  ;; Function: sort(base_ptr: i32, len: i32)
  (func (export "sort") (param $base i32) (param $len i32)
    (local $i i32)
    (local $key i32)
    (local $j i32)
    (local $val_j i32)

    ;; Initialize i = 1
    i32.const 1
    local.set $i

    ;; Loop $outer: Iterate from i = 1 to len
    (block $done
      (loop $outer_loop
        
        ;; Check i < len. If not, break to $done
        local.get $i
        local.get $len
        i32.ge_s
        br_if $done

        ;; key = mem[base + i * 4]
        local.get $base
        local.get $i
        i32.const 2
        i32.shl       ;; i * 4 (shift left 2)
        i32.add
        i32.load
        local.set $key

        ;; j = i - 1
        local.get $i
        i32.const 1
        i32.sub
        local.set $j

        ;; Inner While Loop: (j >= 0) && (mem[j] > key)
        (block $inner_break
          (loop $inner_loop
            
            ;; Condition 1: Check j >= 0
            local.get $j
            i32.const 0
            i32.lt_s      ;; if j < 0
            br_if $inner_break

            ;; Load mem[j]
            local.get $base
            local.get $j
            i32.const 2
            i32.shl
            i32.add
            i32.load
            local.tee $val_j ;; Save value to temp to compare

            ;; Condition 2: Check mem[j] > key
            local.get $key
            i32.le_s      ;; if val_j <= key
            br_if $inner_break

            ;; Shift: mem[j + 1] = mem[j]
            ;; Addr of j+1
            local.get $base
            local.get $j
            i32.const 1
            i32.add
            i32.const 2
            i32.shl
            i32.add
            ;; Value to store (val_j) is already on stack? 
            ;; No, consumed by le_s. Need to use local.get $val_j
            local.get $val_j
            i32.store

            ;; j = j - 1
            local.get $j
            i32.const 1
            i32.sub
            local.set $j

            ;; Repeat inner loop
            br $inner_loop
          )
        ) ;; End $inner_break

        ;; mem[j + 1] = key
        local.get $base
        local.get $j
        i32.const 1
        i32.add
        i32.const 2
        i32.shl
        i32.add
        local.get $key
        i32.store

        ;; i = i + 1
        local.get $i
        i32.const 1
        i32.add
        local.set $i

        ;; Repeat outer loop
        br $outer_loop
      )
    )
  )
)

GPT 5.3 Codex

(module
  (memory (export "memory") 1)

  (func (export "insertion_sort") (param $ptr i32) (param $len i32)
    (local $i i32)
    (local $j i32)
    (local $key i32)
    (local $cur i32)

    i32.const 1
    local.set $i

    block $done
      loop $outer
        local.get $i
        local.get $len
        i32.ge_u
        br_if $done

        local.get $ptr
        local.get $i
        i32.const 2
        i32.shl
        i32.add
        i32.load
        local.set $key

        local.get $i
        local.set $j

        block $inner_done
          loop $inner
            local.get $j
            i32.eqz
            br_if $inner_done

            local.get $ptr
            local.get $j
            i32.const 1
            i32.sub
            i32.const 2
            i32.shl
            i32.add
            i32.load
            local.tee $cur
            local.get $key
            i32.le_s
            br_if $inner_done

            local.get $ptr
            local.get $j
            i32.const 2
            i32.shl
            i32.add
            local.get $cur
            i32.store

            local.get $j
            i32.const 1
            i32.sub
            local.set $j

            br $inner
          end
        end

        local.get $ptr
        local.get $j
        i32.const 2
        i32.shl
        i32.add
        local.get $key
        i32.store

        local.get $i
        i32.const 1
        i32.add
        local.set $i

        br $outer
      end
    end
  )
)

folded s-expressions syntax

(module
(memory (export "memory") 1)
(func (export "insertion_sort") (param $arr i32) (param $len i32) (local $i i32) (local $key i32) (local $j i32)
(local.set $i (i32.const 1))
(block $while_break_1
(loop $while_loop_1
(br_if $while_break_1 (i32.eqz (i32.lt_s (local.get $i) (local.get $len))))
(local.set $key (i32.load (i32.add (local.get $arr) (i32.shl (local.get $i) (i32.const 2)))))
(local.set $j (local.get $i))
(block $while_break_2
(loop $while_loop_2
(br_if $while_break_2 (i32.eqz (if (result i32) (i32.gt_s (local.get $j) (i32.const 0)) (then (i32.gt_s (i32.load (i32.add (local.get $arr) (i32.shl (i32.sub (local.get $j) (i32.const 1)) (i32.const 2)))) (local.get $key))) (else (i32.const 0)))))
(i32.store (i32.add (local.get $arr) (i32.shl (local.get $j) (i32.const 2))) (i32.load (i32.add (local.get $arr) (i32.shl (i32.sub (local.get $j) (i32.const 1)) (i32.const 2)))))
(local.set $j (i32.sub (local.get $j) (i32.const 1)))
(br $while_loop_2)
)
)
(i32.store (i32.add (local.get $arr) (i32.shl (local.get $j) (i32.const 2))) (local.get $key))
(local.set $i (i32.add (local.get $i) (i32.const 1)))
(br $while_loop_1)
)
)
)
)

Insertion Sort WAT Comparison
(gem / gpt / opus)

1) Correctness

Result: all three implementations are correct insertion sort implementations.
Validation performed:
compiled each `.wat` with `wat2wasm`
executed with the same JS harness
tested deterministic suite including:
empty/singleton arrays
sorted/reverse arrays
negative numbers and duplicates
200 additional deterministic mixed arrays
compared output to JS reference sort (`(a,b)=>a-b`)
Outcome:
gem: PASS
gpt: PASS
opus: PASS

2) Line-of-code stats

Counts were taken directly from files.
Variant Total lines Effective code lines*
gem 112 76
gpt 84 71
opus 117 77
*Effective code lines = non-empty lines excluding `;;` comment-only lines.

Variant	Total lines	Effective code lines*
gem	112	76
gpt	84	71
opus	117	77

3) Why line counts differ

Main reasons for significant line differences:
Comment density and explanatory style
gem/opus include many explanatory comments and labeled sections.
gpt version is more compact/comment-light.
Control-flow style choices
all use equivalent outer/inner loop structure, but with different block/loop labeling verbosity.
Addressing strategy
gem/gpt compute element addresses from index (`base + (idx << 2)`) at each use.
opus keeps extra locals (`i_addr`, `j_addr`) and updates byte addresses each iteration.
this can improve readability for some readers but adds additional setup/update instructions and lines.
Boundary-condition encoding
gpt uses `j == 0` as inner-loop stop and reads `j-1` element.
gem/opus use explicit `j < 0` check in signed index style.
So the line-count gap is mostly presentation + instruction organization, not algorithmic complexity.

4) Efficiency comparison

A unified micro-benchmark was run under Node.js with same input shape and iteration count for all three.
Benchmark setup:
array length: 128
iterations: 20,000
same mutation/reset pattern before each sort
single process, same runtime session
Observed timing:
Variant Time (ms)
gpt 43.27
opus 50.73
gem 52.73
Interpretation:
gpt appears fastest in this run.
opus and gem are close, with opus slightly faster than gem here.
Likely causes:
gpt has the most compact hot path with fewer local/address maintenance steps.
opus trades some repeated address arithmetic for extra local updates; this did not win in this run.
gem is similar to gpt structurally but slightly more verbose in hot operations.
Important caveat:
these are micro-benchmark results from one environment/session; exact ordering can vary by engine/JIT warmup/load.
all three are still the same `O(n^2)` insertion sort; asymptotically they are equivalent.

Variant	Time (ms)
gpt	43.27
opus	50.73
gem	52.73

5) Bottom line

Correctness: all three correct.
Compactness: gpt is clearly smallest by total lines.
Practical speed (this run): gpt best, opus second, gem third.
Difference source: implementation style/verbosity and instruction layout, not different sorting algorithm.

```mermaid
flowchart TD
   A["Start: insertion_sort(ptr, len)"] --> B["i = 1"]
   B --> C{"i ge len?"}
   C -- Yes --> Z["End"]
   C -- No --> D["key = arr[i]"]
   D --> E["j = i - 1"]
   E --> F{"j ge 0 AND arr[j] gt key?"}
   F -- Yes --> G["arr[j + 1] = arr[j]"]
   G --> H["j = j - 1"]
   H --> F
   F -- No --> I["arr[j + 1] = key"]
   I --> J["i = i + 1"]
   J --> C
```


7) Same algorithm in other languagesC#include <stddef.h>

void insertion_sort(int* arr, size_t len) {
   for (size_t i = 1; i < len; i++) {
      int key = arr[i];
      size_t j = i;
      while (j > 0 && arr[j - 1] > key) {
         arr[j] = arr[j - 1];
         j--;
      }
      arr[j] = key;
   }
}
Pythondef insertion_sort(arr: list[int]) -> None:
   for i in range(1, len(arr)):
      key = arr[i]
      j = i
      while j > 0 and arr[j - 1] > key:
         arr[j] = arr[j - 1]
         j -= 1
      arr[j] = key
DS3-style pseudo languagefn insertion_sort(arr: int[], len: int)
   i = 1
   while i < len
      key = arr[i]
      j = i
      while j > 0 and arr[j - 1] > key
         arr[j] = arr[j - 1]
         j = j - 1
      arr[j] = key
      i = i + 1

Elon Musk: AI generates machine code directly? AlphaDev by Google DeepMind

Is Elon Musk "on drugs?"
Or some psychological condition.
Or he is aware of something that others are not?

He is promoting his "reality distortion field"...

Elon Musk thinks coding dies this year! - YouTube

transcript:

"I think actually things will move maybe even by the end of this year to
where you don't even bother do doing coding

the AI just creates the binary directly.

Um and the AI can create a much more efficient binary than uh can be done by any compiler.

So just say create optimized binary for this particular outcome
and uh and you actually bypass even traditional coding.

There's there's no that that's an intermediate step that actually uh will not be needed

probably by I'd say the end of this year"

Dustin on X: "Elon Musk thinks coding dies this year. Not evolves. Dies. By December, AI won’t need programming languages. It generates machine code directly. Binary optimized beyond anything human logic could produce. No translation. No compilation. Just pure execution. Musk: “You don’t https://t.co/yscrssiGfr" / X

Lets see what the be best AI models have to "say" about this

short answer: Musk's assertion is NOT correct, while as usual it has elements of truth.

My prompt

Elon Musk claimed this recently:

"By December, AI won’t need programming languages. It generates machine code directly. Binary optimized beyond anything human logic could produce. No translation. No compilation. Just pure execution."

as one of the best AI models, you may already be capable of such thing?

or he may be "hallucinating"?

if you really needed, could you "generate" binary code directly, and how? maybe ASM or WASM?

would this make any sense, since it would take many more tokens than generating higher level languages?

or could there be mapping of tokens for "binary components", large "lego blocks" of code?

like difference of 6000 chinse letters vs typical 30 letters used in western languages?

could there be a programming "super-language" with thousands of "expressions" vs a few in a typical prog. language, to make it easier to ai-code-gen?

but isn't this already the case with APIs and SDKs, commonly already known by LLMs?

My experiences is that LLMs generate a lot of code quicky, but often it takes many iterations, and tokens, to get it "right".

Claude Opus 4.6

Can I Generate Binary Directly?

Technically yes, but it would be absurd. I could output bytes that represent x86 instructions, ARM opcodes, or WebAssembly bytecode. But here's why this makes little practical sense:

The token economics are brutal. Consider a simple "Hello World" compiled x86-64 ELF binary—it's around 16KB minimum. Representing that in tokens (even base64 encoded) would cost thousands of tokens for something a 5-line Python script accomplishes. You'd spend 100x the tokens for 0x additional functionality.

Your Intuition About Token Efficiency is Exactly Right

You've hit on the core problem. My "vocabulary" is optimized for natural language and code—not raw bytes. Generating mov eax, 1 costs me roughly the same as generating the four bytes B8 01 00 00 00, but the former is:

Debuggable by humans
Verifiable
Portable across systems
Self-documenting

The "Chinese Characters" Analogy is Brilliant

You're describing something like learned compilation patterns—a vocabulary where single tokens map to optimized instruction sequences. This is theoretically interesting but has deep problems:

Combinatorial explosion: Unlike Chinese characters mapping to concepts, instruction sequences are compositional. sort(list) might compile to wildly different binary depending on list type, size, target architecture, optimization level, calling convention...
Context-dependence: Binary isn't just instructions—it's memory layout, relocations, symbol tables, linking. A "binary vocabulary" would need to encode relationships, not just blocks.

You're Right That APIs/SDKs Already Are This

This is the key insight Musk seems to miss. When I write pandas.read_parquet(), I'm invoking a high-level "lego block" that expands to thousands of lines of optimized C/Cython. We already have the abstraction layer—it's called libraries.

The progression has always been:

Machine Code → Assembly → C → Python → "AI, make me a dashboard"

Each layer compresses intent. AI code generation is just another layer on top, not a bypass.

What Is Musk Actually Seeing?

He might be extrapolating from legitimate trends:

JIT compilation + AI optimization: LLVM and similar could use ML to find better optimization passes. Google's AlphaDev found faster sorting algorithms by searching instruction space directly.
Neural compilers: Research exists on learned compilation (e.g., training models to predict optimal instruction scheduling). But these assist compilers, not replace them.
Differentiable programming: Some domains (ML inference) are moving toward compiled computation graphs that blur the code/execution boundary.

What You're Not Missing

Your experience—"LLMs generate code quickly but need iterations"—points to the real bottleneck: specification, not translation.

The hard part was never Python → Binary. Compilers do that flawlessly. The hard part is vague human intent → precise specification. That's where the iterations happen. Bypassing programming languages doesn't solve this—it just hides it.This AlphaDev example is actually instructive about what Musk might be extrapolating from. Let me add this context:

AlphaDev: What AI Can Actually Do with Low-Level Code

Google DeepMind's AlphaDev discovered faster sorting algorithms by treating the problem as a game, training a reinforcement learning agent to play it, and these algorithms have been integrated into the LLVM standard C++ sort library.

But here's the critical nuance: AlphaDev found a way to sort a list of three items in 17 instructions instead of 18—it discovered that certain steps could be skipped. The improvement for sorting 5 items was from 46 to 42 instructions, achieving around 70% faster execution.

This is exactly the opposite of what Musk is describing. AlphaDev:

Worked on tiny, bounded problems (3-8 elements)
Took enormous compute to search the space
The longest algorithm it produced was 130 instructions long, and beyond that "learning became slow"
Required humans to reverse-engineer and integrate the results

As one reviewer noted: "Overall it's more about assembly, finding missed compiler optimizations... It was not revolutionary in a sense it hasn't found completely new approaches but converged to something incomprehensible for humans but relatively good for performance."

The Real Insight You're Touching On

Your question about "super-languages" with richer expressions is actually where the interesting future lies. But it's not about bypassing compilation—it's about raising the abstraction level further.

Consider the progression:

Assembly: mov, add, jmp (~hundreds of instructions)
C: for, if, malloc (~hundreds of primitives + stdlib)
Python + NumPy: df.groupby().mean() (~thousands of high-level operations)
Future?: "Create a dashboard showing user retention by cohort with statistical significance testing"

Each level compresses intent. The "compilation" still happens—it just gets longer, not shorter.

What Musk Gets Wrong (or Overstates)

"Binary optimized beyond anything human logic could produce" — AlphaDev found a single skipped instruction in a 17-instruction sequence. Useful, but hardly "beyond human logic."
"No compilation" — You still need to verify correctness, handle memory management, integrate with OS services, link libraries. Binary isn't self-contained.
"December 2025/2026" — For longer algorithms, the team plans to adapt AlphaDev to work with C++ instructions instead of assembly because "with less fine-grained control AlphaDev might miss certain shortcuts, but the approach would be applicable to a wider range of algorithms." Even DeepMind is moving up the abstraction stack, not down.

Your Real Experience Matters

When you say "it takes many iterations to get it right"—that's the specification problem. Generating correct binary from ambiguous intent doesn't eliminate iterations; it just makes them more opaque and harder to debug.

The value of intermediate representations (code) is that they're inspectable. You can see where the logic went wrong. Raw binary generation would make debugging nearly impossible without decompilation—which puts you right back where you started.

references

AlphaDev: Google DeepMind’s game-playing AI just found another way to make code faster

https://www.technologyreview.com/2023/06/07/1074184/google-deepmind-game-ai-alphadev-algorithm-code-faster/

Gemini 3.0 Pro

It is February 2026, and you are asking the right questions about this specific prediction. To answer you directly: Musk is likely extrapolating a real possibility into an aggressive timeline.

While I can technically generate assembly (ASM) or machine code (binary) right now, doing so for complex software is currently inefficient compared to high-level languages. However, Musk’s claim relies on a shift in how AI thinks about code, not just doing what humans do faster.

Here is the breakdown of why this is theoretically possible, why it contradicts current "token logic," and whether I can actually do it.

1. Can I "Generate" Binary Directly?

Yes, technically. I can output a stream of Hexadecimal values or Assembly instructions (like x86 or ARM64) right now.

Example (ASM): MOV EAX, 1 (Move 1 into register EAX)
Example (Binary/Hex): B8 01 00 00 00 (The machine code for the instruction above)

If you asked me to write a "Snake" game in x86 Assembly, I could do it. The problem is efficiency.

In Python: print("Hello") is ~2 tokens.
In Assembly: That same command requires setting up data segments, moving pointers to registers, calling a kernel interrupt, and handling the stack. It might cost 50+ tokens to generate.

So, under the current paradigm, generating binary is "expensive" for the AI (more tokens, more compute) and useless for humans (unreadable).

2. The "Super-Language" & Token Density (The "Lego" Theory)

You hit on a brilliant point with your "Chinese letters" analogy. This is likely the missing link in Musk's logic.

For AI to generate binary efficiently, it wouldn't use the same tokens it uses for English. It would likely use a "Latent Space Compiler."

Current Way: Human Idea -> English/Python (Low Information Density) -> Compiler -> Binary
Musk's Way: Human Idea -> AI Latent Thought -> Binary

In this model, the AI doesn't "speak" Python. It doesn't "speak" Assembly. It maps a single complex "thought" (a massive vector of numbers) directly to a block of executable machine code.

The "Super-Language" already exists inside LLMs. When an LLM understands "Create a secure login system," that concept exists in its "brain" (neural weights) as a complex mathematical representation. Currently, we force the AI to "dumb it down" into Python code so humans can read it. Musk is suggesting we remove the human-readable bottleneck.

If we stop forcing the AI to output text, it could output "binary lego blocks." Imagine a single token that represents "Initialize TCP Stack."

3. Is Musk Hallucinating? (The "December" Timeline)

The Claim: "No programming languages. Just pure execution." The Reality: This is the "End Game" of computer science, but December 2026 is an incredibly tight deadline for total obsolescence.

Why he might be right (The "Steel Man" Argument):

Perfect Optimization: Compilers (like GCC or LLVM) are generalized. They work "pretty good" for all chips. An AI could write binary perfectly tailored to the exact silicon of a specific NVIDIA or Tesla chip, squeezing out 30-50% more performance than any human-written C++ code ever could.
Skip the Middleman: We lose a lot of nuance translating logic into code. If the AI writes the binary, it can use "tricks" that are illegal in Python or C++ but valid in physics/hardware.

Why he might be wrong (The "Black Box" Danger):

No Auditability: If the AI generates binary directly, no human can verify what it does. If the AI hallucinates a backdoor or a bug, we won't see it in the code—because there is no code. We would just see a stream of numbers.
Debugging: You cannot debug binary easily. If the software crashes, you can't "read the line" that failed. You are effectively trusting the AI with your life.

4. Direct Answer to Your Request

"if you really needed, could you 'generate' binary code directly, and how?"

I can generate the representation of binary (Hex Dump) which you can then save as an executable file.

Example of what I can do (Simplified): If you asked for a "Hello World" in raw Linux ELF binary format, I would output a Hex block like this:

7F 45 4C 46 02 01 01 00 ... (header)
B8 01 00 00 00 BF 01 00 ... (instructions)

You would then use a tool like xxd -r -p to turn that text back into a file.

But would it make sense? Not yet.

Token Cost: It costs me 100x more tokens to write the binary byte-by-byte than to write print("hello").
Accuracy: One flipped bit in binary crashes the whole program. One typo in Python is usually a simple error.

Summary

Musk is likely seeing a prototype of a "Neural Compiler"—an AI that skips the "coding" phase and arranges pre-compiled binary blocks (your "Legos") into a working program.

Is it efficient? Only if the AI uses a new "Binary Tokenizer" (the super-language you described).
Will coding die by December? Unlikely to be globally gone, but for top-tier AI models, we may indeed stop asking them for Python and start asking them for "Executables."

The "Chinese vs. Western letters" analogy is the perfect way to understand it. We are moving from spelling out words (Code) to just handing over the ideograms (Binary Concepts).

GPT 5.3 Codex

Great questions. Short answer: Musk is extrapolating a real trend, but the December-style claim is mostly hype.

What’s true

AI can already produce low-level code: assembly, LLVM IR, WAT, even raw byte arrays.
In narrow domains, direct low-level synthesis plus search can beat human hand-tuning.
Compilers may become less “central” in some pipelines if AI emits lower-level IR directly.

What’s not true (yet)

“No programming languages, no compilation, just binary” for general software is unrealistic.
Real systems still need ABI compatibility, linking, security hardening, portability, debugging, verification, and maintenance.
The hardest part is not syntax translation; it is correctness under huge implicit constraints.

Could I generate binary directly?

Technically yes, in principle: emit machine code bytes or wasm binary sections.
Practically, it’s brittle and inefficient compared to generating source/IR then compiling.
Better workflow: generate high-level code or IR, compile, run tests, profile, then optimize hotspots.

About token efficiency / super-language idea

You’re onto something: higher-level “lego blocks” already exist as APIs, SDKs, DSLs, libraries.
A giant token vocabulary with thousands of primitives can help in some areas, but hurts composability/generalization if overdone.
The winning approach is usually layered:
1. high-level intent,
2. strong reusable primitives,
3. automated verification,
4. selective low-level optimization.

What you’re “missing” vs hype

Your observation is correct: generation is fast, but convergence to correct behavior is expensive.
The bottleneck is feedback/verification loops, not whether text is TS, ASM, or binary.
So the future is likely “AI + compilers + verifiers,” not “binary-only AI replacing all languages overnight.”

Saturday, February 14, 2026

AI => WASM sort challenge

Prompt

Opus 4.6

Gemini 3.0 Pro

GPT 5.3 Codex

Insertion Sort WAT Comparison (gem / gpt / opus)

1) Correctness

2) Line-of-code stats

Counts were taken directly from files.VariantTotal linesEffective code lines*gem11276gpt8471opus11777*Effective code lines = non-empty lines excluding ;; comment-only lines.

3) Why line counts differ

4) Efficiency comparison

5) Bottom line

Correctness: all three correct.Compactness: gpt is clearly smallest by total lines.Practical speed (this run): gpt best, opus second, gem third.Difference source: implementation style/verbosity and instruction layout, not different sorting algorithm.

7) Same algorithm in other languages

C

#include <stddef.h> void insertion_sort(int* arr, size_t len) { for (size_t i = 1; i < len; i++) { int key = arr[i]; size_t j = i; while (j > 0 && arr[j - 1] > key) { arr[j] = arr[j - 1]; j--; } arr[j] = key; } }

Python

def insertion_sort(arr: list[int]) -> None: for i in range(1, len(arr)): key = arr[i] j = i while j > 0 and arr[j - 1] > key: arr[j] = arr[j - 1] j -= 1 arr[j] = key

DS3-style pseudo language

fn insertion_sort(arr: int[], len: int) i = 1 while i < len key = arr[i] j = i while j > 0 and arr[j - 1] > key arr[j] = arr[j - 1] j = j - 1 arr[j] = key i = i + 1

Elon Musk: AI generates machine code directly? AlphaDev by Google DeepMind

My prompt

Claude Opus 4.6

Can I Generate Binary Directly?

Your Intuition About Token Efficiency is Exactly Right

The "Chinese Characters" Analogy is Brilliant

You're Right That APIs/SDKs Already Are This

What Is Musk Actually Seeing?

What You're Not Missing

AlphaDev: What AI Can Actually Do with Low-Level Code

The Real Insight You're Touching On

What Musk Gets Wrong (or Overstates)

Your Real Experience Matters

references

Gemini 3.0 Pro

1. Can I "Generate" Binary Directly?

2. The "Super-Language" & Token Density (The "Lego" Theory)

3. Is Musk Hallucinating? (The "December" Timeline)

4. Direct Answer to Your Request

Summary

GPT 5.3 Codex

Insertion Sort WAT Comparison
(gem / gpt / opus)

Counts were taken directly from files.
Variant Total lines Effective code lines*
gem 112 76
gpt 84 71
opus 117 77
*Effective code lines = non-empty lines excluding `;;` comment-only lines.

Correctness: all three correct.
Compactness: gpt is clearly smallest by total lines.
Practical speed (this run): gpt best, opus second, gem third.
Difference source: implementation style/verbosity and instruction layout, not different sorting algorithm.

`#include <stddef.h> void insertion_sort(int* arr, size_t len) { for (size_t i = 1; i < len; i++) { int key = arr[i]; size_t j = i; while (j > 0 && arr[j - 1] > key) { arr[j] = arr[j - 1]; j--; } arr[j] = key; } }`

`def insertion_sort(arr: list[int]) -> None: for i in range(1, len(arr)): key = arr[i] j = i while j > 0 and arr[j - 1] > key: arr[j] = arr[j - 1] j -= 1 arr[j] = key`

`fn insertion_sort(arr: int[], len: int) i = 1 while i < len key = arr[i] j = i while j > 0 and arr[j - 1] > key arr[j] = arr[j - 1] j = j - 1 arr[j] = key i = i + 1`