Saturday, February 14, 2026

AI => WASM sort challenge

While AI models have concluded that Elon Musk is stretching reality with his proclamations that AI will generate binary "directly", technically that is already possible, while terribly inefficient. 

DraganSr: Elon Musk: AI generates machine code directly? AlphaDev by Google DeepMind

Elon Musk thinks coding dies this year! - YouTube

So here is an quick experiment with all 3 best major AI coding models to see how can they solve a simple challenge in "one shot binary code generation". 

Prompt

generate wasm function in linear form (not s-expressions) for insert-sort in this folder, that takes an array of integers array and sorts it.
also generate run.js for calling this with sample array of intergers and prints results.
create ps1 script to convert wat2wasm, and then wasm2wat in s-expressions to make it readable.
do this as one-shot attempt.

Result: all models solved the challenge, and each tool a few minutes (not fast)
Note: WAT is text representation of WASM, a binary executable format.
run.js is used to call binary code.

Does this make Elon's assessment right? 
No.
But proves the concept that AI code generation in any language is real and useful.
Just need to pick right level of abstraction. 
Binary is hard, both for humans and machines.
Engineering 101 

Opus 4.6

;; insert_sort.wat — Insertion Sort in WebAssembly
;; Uses flat/linear (stack-machine) instruction style inside functions
;; The module wrapper uses S-expression syntax as required by WABT's wat2wasm

(module
  ;; Memory: 1 page (64KB), exported so JS can read/write
  (memory (export "memory") 1)

  ;; Function: sort(offset: i32, length: i32)
  ;; Performs in-place insertion sort on an i32 array in linear memory
  (func $sort (export "sort") (param $offset i32) (param $length i32)
    (local $i i32)
    (local $j i32)
    (local $key i32)
    (local $j_addr i32)
    (local $i_addr i32)

    ;; i_addr = offset + 4  (byte address of element[1])
    local.get $offset
    i32.const 4
    i32.add
    local.set $i_addr

    ;; i = 1
    i32.const 1
    local.set $i

    ;; outer loop: for (i = 1; i < length; i++)
    block $break_outer
      loop $loop_outer
        ;; if (i >= length) break
        local.get $i
        local.get $length
        i32.ge_s
        br_if $break_outer

        ;; key = memory[i_addr]
        local.get $i_addr
        i32.load
        local.set $key

        ;; j = i - 1
        local.get $i
        i32.const 1
        i32.sub
        local.set $j

        ;; j_addr = i_addr - 4
        local.get $i_addr
        i32.const 4
        i32.sub
        local.set $j_addr

        ;; inner loop: while (j >= 0 && memory[j_addr] > key)
        block $break_inner
          loop $loop_inner
            ;; if (j < 0) break
            local.get $j
            i32.const 0
            i32.lt_s
            br_if $break_inner

            ;; load memory[j_addr]; if (<= key) break
            local.get $j_addr
            i32.load
            local.get $key
            i32.le_s
            br_if $break_inner

            ;; memory[j_addr + 4] = memory[j_addr]  (shift element right)
            local.get $j_addr
            i32.const 4
            i32.add
            local.get $j_addr
            i32.load
            i32.store

            ;; j--
            local.get $j
            i32.const 1
            i32.sub
            local.set $j

            ;; j_addr -= 4
            local.get $j_addr
            i32.const 4
            i32.sub
            local.set $j_addr

            br $loop_inner
          end ;; end loop $loop_inner
        end ;; end block $break_inner

        ;; memory[j_addr + 4] = key  (insert key into correct position)
        local.get $j_addr
        i32.const 4
        i32.add
        local.get $key
        i32.store

        ;; i++
        local.get $i
        i32.const 1
        i32.add
        local.set $i

        ;; i_addr += 4
        local.get $i_addr
        i32.const 4
        i32.add
        local.set $i_addr

        br $loop_outer
      end ;; end loop $loop_outer
    end ;; end block $break_outer
  ) ;; end func $sort
) ;; end module



// run.js — Load insert_sort.wasm, sort an array of integers, print results
// const fs = require("fs");
// const path = require("path");
import fs from 'fs'
import path from 'path';

async function main() {
  const wasmPath = path.join("insert_sort.wasm");
  const wasmBuffer = fs.readFileSync(wasmPath);
  const { instance } = await WebAssembly.instantiate(wasmBuffer);

  const { memory, sort } = instance.exports;

  // Sample array of integers to sort
  const input = [42, 7, -3, 99, 0, 15, -8, 23, 1, 56, 33, -12, 8, 77, 4];

  console.log("Input array: ", input);
  console.log("Length:       ", input.length);

  // Write the array into WASM linear memory starting at byte offset 0
  const view = new Int32Array(memory.buffer, 0, input.length);
  view.set(input);

  // Call the sort function: sort(offset=0, length=input.length)
  sort(0, input.length);

  // Read back the sorted array
  const sorted = Array.from(new Int32Array(memory.buffer, 0, input.length));

  console.log("Sorted array:", sorted);

  // Verification
  const expected = [...input].sort((a, b) => a - b);
  const pass = JSON.stringify(sorted) === JSON.stringify(expected);
  console.log("Verification:", pass ? "✅ PASS" : "❌ FAIL");
  if (!pass) {
    console.log("Expected:    ", expected);
  }
}

main().catch((err) => {
  console.error("Error:", err);
  process.exit(1);
});


Gemini 3.0 Pro

(module
  ;; Export memory so JS can read/write the array
  (memory (export "mem") 1)

  ;; Function: sort(base_ptr: i32, len: i32)
  (func (export "sort") (param $base i32) (param $len i32)
    (local $i i32)
    (local $key i32)
    (local $j i32)
    (local $val_j i32)

    ;; Initialize i = 1
    i32.const 1
    local.set $i

    ;; Loop $outer: Iterate from i = 1 to len
    (block $done
      (loop $outer_loop
       
        ;; Check i < len. If not, break to $done
        local.get $i
        local.get $len
        i32.ge_s
        br_if $done

        ;; key = mem[base + i * 4]
        local.get $base
        local.get $i
        i32.const 2
        i32.shl       ;; i * 4 (shift left 2)
        i32.add
        i32.load
        local.set $key

        ;; j = i - 1
        local.get $i
        i32.const 1
        i32.sub
        local.set $j

        ;; Inner While Loop: (j >= 0) && (mem[j] > key)
        (block $inner_break
          (loop $inner_loop
           
            ;; Condition 1: Check j >= 0
            local.get $j
            i32.const 0
            i32.lt_s      ;; if j < 0
            br_if $inner_break

            ;; Load mem[j]
            local.get $base
            local.get $j
            i32.const 2
            i32.shl
            i32.add
            i32.load
            local.tee $val_j ;; Save value to temp to compare

            ;; Condition 2: Check mem[j] > key
            local.get $key
            i32.le_s      ;; if val_j <= key
            br_if $inner_break

            ;; Shift: mem[j + 1] = mem[j]
            ;; Addr of j+1
            local.get $base
            local.get $j
            i32.const 1
            i32.add
            i32.const 2
            i32.shl
            i32.add
            ;; Value to store (val_j) is already on stack?
            ;; No, consumed by le_s. Need to use local.get $val_j
            local.get $val_j
            i32.store

            ;; j = j - 1
            local.get $j
            i32.const 1
            i32.sub
            local.set $j

            ;; Repeat inner loop
            br $inner_loop
          )
        ) ;; End $inner_break

        ;; mem[j + 1] = key
        local.get $base
        local.get $j
        i32.const 1
        i32.add
        i32.const 2
        i32.shl
        i32.add
        local.get $key
        i32.store

        ;; i = i + 1
        local.get $i
        i32.const 1
        i32.add
        local.set $i

        ;; Repeat outer loop
        br $outer_loop
      )
    )
  )
)

GPT 5.3 Codex

(module
  (memory (export "memory") 1)

  (func (export "insertion_sort") (param $ptr i32) (param $len i32)
    (local $i i32)
    (local $j i32)
    (local $key i32)
    (local $cur i32)

    i32.const 1
    local.set $i

    block $done
      loop $outer
        local.get $i
        local.get $len
        i32.ge_u
        br_if $done

        local.get $ptr
        local.get $i
        i32.const 2
        i32.shl
        i32.add
        i32.load
        local.set $key

        local.get $i
        local.set $j

        block $inner_done
          loop $inner
            local.get $j
            i32.eqz
            br_if $inner_done

            local.get $ptr
            local.get $j
            i32.const 1
            i32.sub
            i32.const 2
            i32.shl
            i32.add
            i32.load
            local.tee $cur
            local.get $key
            i32.le_s
            br_if $inner_done

            local.get $ptr
            local.get $j
            i32.const 2
            i32.shl
            i32.add
            local.get $cur
            i32.store

            local.get $j
            i32.const 1
            i32.sub
            local.set $j

            br $inner
          end
        end

        local.get $ptr
        local.get $j
        i32.const 2
        i32.shl
        i32.add
        local.get $key
        i32.store

        local.get $i
        i32.const 1
        i32.add
        local.set $i

        br $outer
      end
    end
  )
)


Insertion Sort WAT Comparison
(gem / gpt / opus)

1) Correctness

Result: all three implementations are correct insertion sort implementations.

Validation performed:

  • compiled each .wat with wat2wasm
  • executed with the same JS harness
  • tested deterministic suite including:
    • empty/singleton arrays
    • sorted/reverse arrays
    • negative numbers and duplicates
    • 200 additional deterministic mixed arrays
  • compared output to JS reference sort ((a,b)=>a-b)

Outcome:

  • gem: PASS
  • gpt: PASS
  • opus: PASS

2) Line-of-code stats

Counts were taken directly from files.

VariantTotal linesEffective code lines*
gem11276
gpt8471
opus11777

*Effective code lines = non-empty lines excluding ;; comment-only lines.

3) Why line counts differ

Main reasons for significant line differences:

  1. Comment density and explanatory style

    • gem/opus include many explanatory comments and labeled sections.
    • gpt version is more compact/comment-light.
  2. Control-flow style choices

    • all use equivalent outer/inner loop structure, but with different block/loop labeling verbosity.
  3. Addressing strategy

    • gem/gpt compute element addresses from index (base + (idx << 2)) at each use.
    • opus keeps extra locals (i_addrj_addr) and updates byte addresses each iteration.
    • this can improve readability for some readers but adds additional setup/update instructions and lines.
  4. Boundary-condition encoding

    • gpt uses j == 0 as inner-loop stop and reads j-1 element.
    • gem/opus use explicit j < 0 check in signed index style.

So the line-count gap is mostly presentation + instruction organization, not algorithmic complexity.

4) Efficiency comparison

A unified micro-benchmark was run under Node.js with same input shape and iteration count for all three.

Benchmark setup:

  • array length: 128
  • iterations: 20,000
  • same mutation/reset pattern before each sort
  • single process, same runtime session

Observed timing:

VariantTime (ms)
gpt43.27
opus50.73
gem52.73

Interpretation:

  • gpt appears fastest in this run.
  • opus and gem are close, with opus slightly faster than gem here.

Likely causes:

  • gpt has the most compact hot path with fewer local/address maintenance steps.
  • opus trades some repeated address arithmetic for extra local updates; this did not win in this run.
  • gem is similar to gpt structurally but slightly more verbose in hot operations.

Important caveat:

  • these are micro-benchmark results from one environment/session; exact ordering can vary by engine/JIT warmup/load.
  • all three are still the same O(n^2) insertion sort; asymptotically they are equivalent.

5) Bottom line

  • Correctness: all three correct.
  • Compactness: gpt is clearly smallest by total lines.
  • Practical speed (this run): gpt best, opus second, gem third.
  • Difference source: implementation style/verbosity and instruction layout, not different sorting algorithm.


```mermaid
flowchart TD
   A["Start: insertion_sort(ptr, len)"] --> B["i = 1"]
   B --> C{"i ge len?"}
   C -- Yes --> Z["End"]
   C -- No --> D["key = arr[i]"]
   D --> E["j = i - 1"]
   E --> F{"j ge 0 AND arr[j] gt key?"}
   F -- Yes --> G["arr[j + 1] = arr[j]"]
   G --> H["j = j - 1"]
   H --> F
   F -- No --> I["arr[j + 1] = key"]
   I --> J["i = i + 1"]
   J --> C
```


7) Same algorithm in other languages

C

#include <stddef.h>

void insertion_sort(int* arr, size_t len) {
   for (size_t i = 1; i < len; i++) {
      int key = arr[i];
      size_t j = i;
      while (j > 0 && arr[j - 1] > key) {
         arr[j] = arr[j - 1];
         j--;
      }
      arr[j] = key;
   }
}

Python

def insertion_sort(arr: list[int]) -> None:
   for i in range(1, len(arr)):
      key = arr[i]
      j = i
      while j > 0 and arr[j - 1] > key:
         arr[j] = arr[j - 1]
         j -= 1
      arr[j] = key

DS3-style pseudo language

fn insertion_sort(arr: int[], len: int)
   i = 1
   while i < len
      key = arr[i]
      j = i
      while j > 0 and arr[j - 1] > key
         arr[j] = arr[j - 1]
         j = j - 1
      arr[j] = key
      i = i + 1

No comments: