DraganSr: 2025-12-18

Thursday, December 18, 2025

AI file format: TOON: Token-Oriented Object Notation

Finally, a meaningful alternative for JSON inefficient minimal syntax => maximal overhed

toon-format/toon: 🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK. @GitHub

Token-Oriented Object Notation is a compact, human-readable encoding of the JSON data model that minimizes tokens and makes structure easy for models to follow. It's intended for LLM input as a drop-in, lossless representation of your existing JSON.

TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably. For deeply nested or non-uniform data, JSON may be more efficient.

The similarity to CSV is intentional: CSV is simple and ubiquitous, and TOON aims to keep that familiarity while remaining a lossless, drop-in representation of JSON for Large Language Models.

Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.

TOON: “JSON for AI” (is it any good?) - YouTube theo - t3․gg

Benchmarked JSON vs TOON for AI reasoners — 40–80% token savings. Real numbers inside. : r/LocalLLaMA

Will this file format REPLACE JSON!? - YouTube

TOON (Token-Oriented Object Notation) is a newer, leaner data format designed to be more efficient for Large Language Models (LLMs) than JSON, significantly cutting costs and improving speed by stripping redundant syntax (like quotes, braces, commas) and using indentation/tabular layouts, making it great for structured, uniform data; JSON remains the standard for general APIs but is verbose for AI, while TOON excels in token reduction and clarity for LLM tasks like agent inputs, though JSON is better for complex nesting.

JSON (JavaScript Object Notation)

Best For
: General API communication, complex nested data, diverse data types, storage.

Pros: Universal standard, flexible for varied structures.

Cons: Verbose, costly for LLMs (every symbol is a token), repetitive keys.

TOON (Token-Oriented Object Notation)

Best For: LLM prompts, structured agent inputs, product catalogs, logs (tabular/flat data).
Pros: Highly token-efficient (30-60% less), lower API costs, faster inference, human-readable.
Cons: Less flexible for deeply nested data; requires strict structure (indentation-based).
What Is TOON: the Compact Data Format for LLMs and AI
AI summary

TOON (Token-Oriented Object Notation) handles complexity by switching between two different "modes" depending on the structure of your data: Tabular Mode for uniform lists (its main efficiency trick) and Indented Mode (similar to YAML) for nested objects.

Here is how specific complex types are encoded:

1. Arrays (The "Tabular" Mode)

TOON is most famous for compressing arrays of objects. Instead of repeating keys like JSON, it creates a header row.

Uniform Arrays: If you have a list of objects with the same keys, TOON uses a table.
- Syntax: key[count]{header1, header2}:
- Example:
```
users[3]{id, name, role}:
  1, Alice, admin
  2, Bob, editor
  3, Charlie, viewer
```
Simple Arrays: Lists of primitives are written inline.
- Example: tags[3]: "news", "tech", "2025"

2. Nested Objects

For nested data, TOON generally abandons the table structure and switches to an indentation-based syntax (similar to YAML or Python).

Standard Nesting: Use 2-space indentation to show hierarchy.

user:
  id: 123
  profile:
    name: "Alice"
    settings:
      theme: "dark"
      notifications: true

Key Folding (Dot Notation): If you have a deep hierarchy with only single keys at each level, TOON allows "folding" them to save tokens.
- Instead of:
```
server:
  config:
    port: 8080
```
- You can write:
```
server.config.port: 8080
```

3. Strings & Special Characters

TOON tries to avoid quotation marks to save tokens, but it has specific rules for when they must be used.

Standard Strings: No quotes required.
- status: active
Strings with Delimiters: If a string contains the delimiter character (usually a comma ,), it must be wrapped in quotes.
- location: "Paris, France"
Strings with Newlines: These use standard escape sequences like \n and must be quoted.
- description: "First line.\nSecond line."

4. Handling "Bad" Characters (Delimiter Collision)

If your data contains many commas (e.g., a list of addresses or sentences), using a comma as a separator becomes inefficient because you have to quote every string.

TOON allows you to change the delimiter to a Pipe | or Tab to avoid this. You declare the delimiter after the header.

Example (Using Pipe | to avoid quoting commas):

# Notice the '|' after the curly braces
articles[2]{id, title, summary}|:
  101 | Hello, World! | A guide to "hello, world" apps.
  102 | Data Formats  | JSON, YAML, and TOON explained.

Because we switched to |, we didn't need to quote "Hello, World!" even though it has a comma.

Summary: JSON vs. TOON for Complex Data

Data Structure	JSON Approach	TOON Approach
List of Objects	`[{"a":1}, {"a":2}]`	`list[2]{a}: 1, 2` (Tabular)
Nested Object	`{"a": {"b": 1}}`	`a: b: 1` (Indented)
Deep Nesting	`{"a": {"b": {"c": 1}}}`	`a.b.c: 1` (Dot Notation)
Text w/ Commas	`"Hello, world"`	`"Hello, world"` (Quoted)
Text w/ Pipes	`"Hello, world"`	`Hello, world` (Unquoted, using `

Handling sparse data

This is the critical "catch" with TOON. You have correctly identified the format's weakness: tabular data requires uniformity.

If you have an array where every object has different keys (sparse data, polymorphic objects, or "100 possible keys but only 2 used"), TOON abandons the table format.

Instead, it falls back to a standard indented list (YAML-style).

1. The "Fallback" Mode (Dash-List)

When the parser (or human) sees that the data is not uniform enough for a table, it switches to using dashes - for array items. This looks almost exactly like YAML.

Scenario: You have a list of "events," but each event type has totally different fields.

In JSON:

[
  {"type": "click", "x": 10, "y": 20},
  {"type": "keypress", "key": "Enter", "modifier": "Shift"},
  {"type": "scroll", "delta": 500}
]

In TOON (Mixed/Sparse Mode):

events[3]:
  - type: click
    x: 10
    y: 20
  - type: keypress
    key: Enter
    modifier: Shift
  - type: scroll
    delta: 500

Note: We lost the table header compression, but we still saved tokens by removing all the curly braces {}, commas ,, and quotes "".

2. The "Sparse Data" Problem (100 keys vs. 2 values)

If you tried to force your "100 possible keys" scenario into a TOON table, it would actually be worse than JSON because you would need to print 98 empty delimiters for every row.

The Wrong Way (Forcing a Table): This is inefficient and unreadable.

# Don't do this!
items[2]{id, name, ...98 other keys...}:
  1, itemA, , , , , , , ... , , 
  2, itemB, , , , , , , ... , ,

The Right Way (TOON Standard): TOON handles this exactly like JSON or YAML would—by only writing the keys that exist.

items[2]:
  - id: 1
    name: itemA
    # The other 98 null keys are simply omitted
  - id: 2
    name: itemB
    custom_field: "special"

3. Mixing Modes (Hybrid Approach)

The strength of TOON is that it is context-aware. You can have one part of your file in "Table Mode" (for dense data) and another part in "List Mode" (for sparse/complex data) in the same document.

Example:

# A uniform list of users (Perfect for Table)
users[3]{id, role}:
  1, admin
  2, editor
  3, viewer

# A complex, sparse audit log (Switch to List)
audit_log[2]:
  - action: login
    timestamp: 12:00
    meta:
      ip: 192.168.1.1
  - action: error
    code: 500
    details: "Database timeout"

Summary of Efficiency

Data Shape	JSON	TOON (Table)	TOON (List/Dash)	Verdict
Dense/Uniform	Heavy `{ "k":"v" }`	Best `v`	Good `k: v`	TOON Table wins comfortably.
Sparse/Complex	Heavy `{ "k":"v" }`	Invalid/Bad	Better `k: v`	TOON List wins (saves braces/quotes).

So, for your specific case (complex objects, many nullable properties), TOON effectively behaves like minified YAML. You won't get the massive 50% reduction you see with tables, but you will still save roughly 15–20% in token count compared to JSON by removing the syntax noise.

TOON vs JSON5

While both TOON and JSON5 are "better versions of JSON" designed to be more human-readable, they have completely different goals.

JSON5 is optimized for humans writing code (config files, manual data entry).
TOON is optimized for AI models reading data (saving tokens, reducing API costs).

Here is a breakdown of how they compare.

1. Visual Comparison

The most striking difference is in how they handle lists of objects.

JSON5 (The "Developer" Format)

Looks like JavaScript code.
Allows comments, trailing commas, and unquoted keys.
Still uses braces {} and explicit keys for every item.

{
  // Comments are allowed!
  users: [
    { id: 1, name: 'Alice', role: 'admin' },, // Trailing comma allowed
    { id: 2, name: 'Bob', role: 'editor' }    // Keys repeated every time
  ]
}

TOON (The "AI" Format)

Looks like a spreadsheet or YAML.
Removes braces and repeated keys to save tokens.
Uses a header row for arrays.

# Comments are also allowed (starts with #)
users[2]{id, name, role}:
  1, Alice, admin
  2, Bob, editor

2. Feature Comparison Matrix

Feature	JSON5	TOON
Primary Goal	Ease of editing for humans (Config files).	Token efficiency for LLMs (API costs).
Keys	Can be unquoted (if valid JS identifier).	Removed entirely in tabular arrays; unquoted in objects.
Arrays	Standard list `[{}, {}]`.	Tabular with headers (massive token savings).
Hierarchy	Braces `{ }`.	Indentation (YAML-style).
Strings	Single `'` or Double `"`; multi-line supported.	No quotes needed (unless containing delimiters).
Comments	`//` and `/* */` (Supported).	`#` (Supported).
Token Cost	High. (Repeats keys, uses braces).	Low. (30–50% fewer tokens than JSON).

3. Deep Dive: Why they are different

The "Repeated Key" Problem

JSON5 fails to solve the biggest inefficiency of JSON: repeating key names. If you have a list of 1,000 users, JSON5 still requires you to write name: 1,000 times.
TOON solves this by defining name once in the header. For large datasets, this makes TOON significantly smaller.

The "Parsing" Problem

JSON5 is a strict subset of JavaScript. If you are a web developer, you already know it. It is perfect for VS Code settings or project config files.
TOON requires a specialized parser. It is not native to browsers or Node.js. It is strictly a data-interchange format for feeding context to AI agents.

4. When to use which?

Use JSON5 when:

You are writing a configuration file (e.g., .eslintrc, tsconfig).
Humans need to edit the file manually and frequently.
You need to "comment out" lines of data for testing.
File size and token count are irrelevant (processing is local).

Use TOON when:

You are sending data to GPT-4, Claude, or Llama.
You have large lists of uniform data (SQL results, logs, product inventories).
You are paying for API usage per token and want to cut costs by ~40%.
You need the AI to process data faster (fewer tokens = lower latency).

Beyond Express: TypeScript web servers

Forget Express.js — opt for these alternatives instead - LogRocket Blog

Beyond Express: Exploring Emerging Node.js Frameworks for the Modern Web | by habtesoft | Medium

Creating a web server with typescript, should I go for express or fastify? Which one has better packages for typescript integration? Any recommendations for packages for each? : r/node

Fastify

Much faster than Express (2-3x)
Built-in schema validation (JSON Schema)
TypeScript-friendly
Modern async/await support
Plugin architecture
Growing ecosystem

Koa

By Express creators (next-gen)
Smaller, more modular
Better async/await (no callback hell)
Middleware uses ctx instead of req/res
Requires more middleware setup

Hono

Ultra-fast, modern
Works on Edge (Cloudflare Workers, Deno, Bun)
Excellent TypeScript support
Express-like API
Very small bundle size

NestJS

Full framework (not just routing)
Angular-inspired architecture
Dependency injection
Built-in support for GraphQL, WebSockets, microservices
Great for large enterprise apps
Steeper learning curve