Token-Oriented Object Notation is a compact, human-readable encoding of the JSON data model that minimizes tokens and makes structure easy for models to follow. It's intended for LLM input as a drop-in, lossless representation of your existing JSON.
TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably. For deeply nested or non-uniform data, JSON may be more efficient.
The similarity to CSV is intentional: CSV is simple and ubiquitous, and TOON aims to keep that familiarity while remaining a lossless, drop-in representation of JSON for Large Language Models.
Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.
Will this file format REPLACE JSON!? - YouTube
TOON (Token-Oriented Object Notation) is a newer, leaner data format designed to be more efficient for Large Language Models (LLMs) than JSON, significantly cutting costs and improving speed by stripping redundant syntax (like quotes, braces, commas) and using indentation/tabular layouts, making it great for structured, uniform data; JSON remains the standard for general APIs but is verbose for AI, while TOON excels in token reduction and clarity for LLM tasks like agent inputs, though JSON is better for complex nesting.- Best For: General API communication, complex nested data, diverse data types, storage.
- Pros: Universal standard, flexible for varied structures.
- Cons: Verbose, costly for LLMs (every symbol is a token), repetitive keys.
- Best For: LLM prompts, structured agent inputs, product catalogs, logs (tabular/flat data).
- Pros: Highly token-efficient (30-60% less), lower API costs, faster inference, human-readable.
- Cons: Less flexible for deeply nested data; requires strict structure (indentation-based).
- What Is TOON: the Compact Data Format for LLMs and AI
- AI summary
Here is how specific complex types are encoded:
1. Arrays (The "Tabular" Mode)
TOON is most famous for compressing arrays of objects. Instead of repeating keys like JSON, it creates a header row.
- Uniform Arrays: If you have a list of objects with the same keys, TOON uses a table.
- Syntax:
key[count]{header1, header2}: - Example:
users[3]{id, name, role}: 1, Alice, admin 2, Bob, editor 3, Charlie, viewer
- Syntax:
- Simple Arrays: Lists of primitives are written inline.
- Example:
tags[3]: "news", "tech", "2025"
- Example:
2. Nested Objects
For nested data, TOON generally abandons the table structure and switches to an indentation-based syntax (similar to YAML or Python).
- Standard Nesting: Use 2-space indentation to show hierarchy.
user: id: 123 profile: name: "Alice" settings: theme: "dark" notifications: true - Key Folding (Dot Notation): If you have a deep hierarchy with only single keys at each level, TOON allows "folding" them to save tokens.
- Instead of:
server: config: port: 8080 - You can write:
server.config.port: 8080
- Instead of:
3. Strings & Special Characters
TOON tries to avoid quotation marks to save tokens, but it has specific rules for when they must be used.
- Standard Strings: No quotes required.
status: active
- Strings with Delimiters: If a string contains the delimiter character (usually a comma
,), it must be wrapped in quotes.location: "Paris, France"
- Strings with Newlines: These use standard escape sequences like
\nand must be quoted.description: "First line.\nSecond line."
4. Handling "Bad" Characters (Delimiter Collision)
If your data contains many commas (e.g., a list of addresses or sentences), using a comma as a separator becomes inefficient because you have to quote every string.
TOON allows you to change the delimiter to a Pipe | or Tab to avoid this. You declare the delimiter after the header.
- Example (Using Pipe
|to avoid quoting commas):Because we switched to# Notice the '|' after the curly braces articles[2]{id, title, summary}|: 101 | Hello, World! | A guide to "hello, world" apps. 102 | Data Formats | JSON, YAML, and TOON explained.|, we didn't need to quote "Hello, World!" even though it has a comma.
Summary: JSON vs. TOON for Complex Data
| Data Structure | JSON Approach | TOON Approach |
|---|---|---|
| List of Objects | [{"a":1}, {"a":2}] | list[2]{a}: 1, 2 (Tabular) |
| Nested Object | {"a": {"b": 1}} | a: b: 1 (Indented) |
| Deep Nesting | {"a": {"b": {"c": 1}}} | a.b.c: 1 (Dot Notation) |
| Text w/ Commas | "Hello, world" | "Hello, world" (Quoted) |
| Text w/ Pipes | "Hello, world" | Hello, world (Unquoted, using ` |
Handling sparse data
This is the critical "catch" with TOON. You have correctly identified the format's weakness: tabular data requires uniformity.
If you have an array where every object has different keys (sparse data, polymorphic objects, or "100 possible keys but only 2 used"), TOON abandons the table format.
Instead, it falls back to a standard indented list (YAML-style).
1. The "Fallback" Mode (Dash-List)
When the parser (or human) sees that the data is not uniform enough for a table, it switches to using dashes - for array items. This looks almost exactly like YAML.
Scenario: You have a list of "events," but each event type has totally different fields.
In JSON:
[
{"type": "click", "x": 10, "y": 20},
{"type": "keypress", "key": "Enter", "modifier": "Shift"},
{"type": "scroll", "delta": 500}
]
In TOON (Mixed/Sparse Mode):
events[3]:
- type: click
x: 10
y: 20
- type: keypress
key: Enter
modifier: Shift
- type: scroll
delta: 500
Note: We lost the table header compression, but we still saved tokens by removing all the curly braces {}, commas ,, and quotes "".
2. The "Sparse Data" Problem (100 keys vs. 2 values)
If you tried to force your "100 possible keys" scenario into a TOON table, it would actually be worse than JSON because you would need to print 98 empty delimiters for every row.
The Wrong Way (Forcing a Table): This is inefficient and unreadable.
# Don't do this!
items[2]{id, name, ...98 other keys...}:
1, itemA, , , , , , , ... , ,
2, itemB, , , , , , , ... , ,
The Right Way (TOON Standard): TOON handles this exactly like JSON or YAML would—by only writing the keys that exist.
items[2]:
- id: 1
name: itemA
# The other 98 null keys are simply omitted
- id: 2
name: itemB
custom_field: "special"
3. Mixing Modes (Hybrid Approach)
The strength of TOON is that it is context-aware. You can have one part of your file in "Table Mode" (for dense data) and another part in "List Mode" (for sparse/complex data) in the same document.
Example:
# A uniform list of users (Perfect for Table)
users[3]{id, role}:
1, admin
2, editor
3, viewer
# A complex, sparse audit log (Switch to List)
audit_log[2]:
- action: login
timestamp: 12:00
meta:
ip: 192.168.1.1
- action: error
code: 500
details: "Database timeout"
Summary of Efficiency
| Data Shape | JSON | TOON (Table) | TOON (List/Dash) | Verdict |
|---|---|---|---|---|
| Dense/Uniform | Heavy { "k":"v" } | Best v | Good k: v | TOON Table wins comfortably. |
| Sparse/Complex | Heavy { "k":"v" } | Invalid/Bad | Better k: v | TOON List wins (saves braces/quotes). |
So, for your specific case (complex objects, many nullable properties), TOON effectively behaves like minified YAML. You won't get the massive 50% reduction you see with tables, but you will still save roughly 15–20% in token count compared to JSON by removing the syntax noise.
TOON vs JSON5
While both TOON and JSON5 are "better versions of JSON" designed to be more human-readable, they have completely different goals.
- JSON5 is optimized for humans writing code (config files, manual data entry).
- TOON is optimized for AI models reading data (saving tokens, reducing API costs).
Here is a breakdown of how they compare.
1. Visual Comparison
The most striking difference is in how they handle lists of objects.
JSON5 (The "Developer" Format)
- Looks like JavaScript code.
- Allows comments, trailing commas, and unquoted keys.
- Still uses braces
{}and explicit keys for every item.
{
// Comments are allowed!
users: [
{ id: 1, name: 'Alice', role: 'admin' },, // Trailing comma allowed
{ id: 2, name: 'Bob', role: 'editor' } // Keys repeated every time
]
}
TOON (The "AI" Format)
- Looks like a spreadsheet or YAML.
- Removes braces and repeated keys to save tokens.
- Uses a header row for arrays.
# Comments are also allowed (starts with #)
users[2]{id, name, role}:
1, Alice, admin
2, Bob, editor
2. Feature Comparison Matrix
| Feature | JSON5 | TOON |
|---|---|---|
| Primary Goal | Ease of editing for humans (Config files). | Token efficiency for LLMs (API costs). |
| Keys | Can be unquoted (if valid JS identifier). | Removed entirely in tabular arrays; unquoted in objects. |
| Arrays | Standard list [{}, {}]. | Tabular with headers (massive token savings). |
| Hierarchy | Braces { }. | Indentation (YAML-style). |
| Strings | Single ' or Double "; multi-line supported. | No quotes needed (unless containing delimiters). |
| Comments | // and /* */ (Supported). | # (Supported). |
| Token Cost | High. (Repeats keys, uses braces). | Low. (30–50% fewer tokens than JSON). |
3. Deep Dive: Why they are different
The "Repeated Key" Problem
- JSON5 fails to solve the biggest inefficiency of JSON: repeating key names. If you have a list of 1,000 users, JSON5 still requires you to write
name:1,000 times. - TOON solves this by defining
nameonce in the header. For large datasets, this makes TOON significantly smaller.
The "Parsing" Problem
- JSON5 is a strict subset of JavaScript. If you are a web developer, you already know it. It is perfect for VS Code settings or project config files.
- TOON requires a specialized parser. It is not native to browsers or Node.js. It is strictly a data-interchange format for feeding context to AI agents.
4. When to use which?
Use JSON5 when:
- You are writing a configuration file (e.g.,
.eslintrc,tsconfig). - Humans need to edit the file manually and frequently.
- You need to "comment out" lines of data for testing.
- File size and token count are irrelevant (processing is local).
Use TOON when:
- You are sending data to GPT-4, Claude, or Llama.
- You have large lists of uniform data (SQL results, logs, product inventories).
- You are paying for API usage per token and want to cut costs by ~40%.
- You need the AI to process data faster (fewer tokens = lower latency).
