Here is how specific complex types are encoded:
1. Arrays (The "Tabular" Mode)
TOON is most famous for compressing arrays of objects. Instead of repeating keys like JSON, it creates a header row.
- Uniform Arrays: If you have a list of objects with the same keys, TOON uses a table.
- Syntax:
key[count]{header1, header2}: - Example:
users[3]{id, name, role}:
1, Alice, admin
2, Bob, editor
3, Charlie, viewer
- Simple Arrays: Lists of primitives are written inline.
- Example:
tags[3]: "news", "tech", "2025"
2. Nested Objects
For nested data, TOON generally abandons the table structure and switches to an indentation-based syntax (similar to YAML or Python).
3. Strings & Special Characters
TOON tries to avoid quotation marks to save tokens, but it has specific rules for when they must be used.
- Standard Strings: No quotes required.
- Strings with Delimiters: If a string contains the delimiter character (usually a comma
,), it must be wrapped in quotes.location: "Paris, France"
- Strings with Newlines: These use standard escape sequences like
\n and must be quoted.description: "First line.\nSecond line."
4. Handling "Bad" Characters (Delimiter Collision)
If your data contains many commas (e.g., a list of addresses or sentences), using a comma as a separator becomes inefficient because you have to quote every string.
TOON allows you to change the delimiter to a Pipe | or Tab to avoid this. You declare the delimiter after the header.
- Example (Using Pipe
| to avoid quoting commas):
articles[2]{id, title, summary}|:
101 | Hello, World! | A guide to "hello, world" apps.
102 | Data Formats | JSON, YAML, and TOON explained.
Because we switched to |, we didn't need to quote "Hello, World!" even though it has a comma.
Summary: JSON vs. TOON for Complex Data
| Data Structure | JSON Approach | TOON Approach |
|---|
| List of Objects | [{"a":1}, {"a":2}] | list[2]{a}: 1, 2 (Tabular) |
| Nested Object | {"a": {"b": 1}} | a: b: 1 (Indented) |
| Deep Nesting | {"a": {"b": {"c": 1}}} | a.b.c: 1 (Dot Notation) |
| Text w/ Commas | "Hello, world" | "Hello, world" (Quoted) |
| Text w/ Pipes | "Hello, world" | Hello, world (Unquoted, using ` |
Handling sparse data
This is the critical "catch" with TOON. You have correctly identified the format's weakness: tabular data requires uniformity.
If you have an array where every object has different keys (sparse data, polymorphic objects, or "100 possible keys but only 2 used"), TOON abandons the table format.
Instead, it falls back to a standard indented list (YAML-style).
1. The "Fallback" Mode (Dash-List)
When the parser (or human) sees that the data is not uniform enough for a table, it switches to using dashes - for array items. This looks almost exactly like YAML.
Scenario: You have a list of "events," but each event type has totally different fields.
In JSON:
[
{"type": "click", "x": 10, "y": 20},
{"type": "keypress", "key": "Enter", "modifier": "Shift"},
{"type": "scroll", "delta": 500}
]
In TOON (Mixed/Sparse Mode):
events[3]:
- type: click
x: 10
y: 20
- type: keypress
key: Enter
modifier: Shift
- type: scroll
delta: 500
Note: We lost the table header compression, but we still saved tokens by removing all the curly braces {}, commas ,, and quotes "".
2. The "Sparse Data" Problem (100 keys vs. 2 values)
If you tried to force your "100 possible keys" scenario into a TOON table, it would actually be worse than JSON because you would need to print 98 empty delimiters for every row.
The Wrong Way (Forcing a Table): This is inefficient and unreadable.
items[2]{id, name, ...98 other keys...}:
1, itemA, , , , , , , ... , ,
2, itemB, , , , , , , ... , ,
The Right Way (TOON Standard): TOON handles this exactly like JSON or YAML would—by only writing the keys that exist.
items[2]:
- id: 1
name: itemA
- id: 2
name: itemB
custom_field: "special"
3. Mixing Modes (Hybrid Approach)
The strength of TOON is that it is context-aware. You can have one part of your file in "Table Mode" (for dense data) and another part in "List Mode" (for sparse/complex data) in the same document.
Example:
users[3]{id, role}:
1, admin
2, editor
3, viewer
audit_log[2]:
- action: login
timestamp: 12:00
meta:
ip: 192.168.1.1
- action: error
code: 500
details: "Database timeout"
Summary of Efficiency
| Data Shape | JSON | TOON (Table) | TOON (List/Dash) | Verdict |
|---|
| Dense/Uniform | Heavy { "k":"v" } | Best v | Good k: v | TOON Table wins comfortably. |
| Sparse/Complex | Heavy { "k":"v" } | Invalid/Bad | Better k: v | TOON List wins (saves braces/quotes). |
So, for your specific case (complex objects, many nullable properties), TOON effectively behaves like minified YAML. You won't get the massive 50% reduction you see with tables, but you will still save roughly 15–20% in token count compared to JSON by removing the syntax noise.
TOON vs JSON5
While both TOON and JSON5 are "better versions of JSON" designed to be more human-readable, they have completely different goals.
- JSON5 is optimized for humans writing code (config files, manual data entry).
- TOON is optimized for AI models reading data (saving tokens, reducing API costs).
Here is a breakdown of how they compare.
1. Visual Comparison
The most striking difference is in how they handle lists of objects.
JSON5 (The "Developer" Format)
- Looks like JavaScript code.
- Allows comments, trailing commas, and unquoted keys.
- Still uses braces
{} and explicit keys for every item.
{
users: [
{ id: 1, name: 'Alice', role: 'admin' },,
{ id: 2, name: 'Bob', role: 'editor' }
]
}
TOON (The "AI" Format)
- Looks like a spreadsheet or YAML.
- Removes braces and repeated keys to save tokens.
- Uses a header row for arrays.
users[2]{id, name, role}:
1, Alice, admin
2, Bob, editor
2. Feature Comparison Matrix
| Feature | JSON5 | TOON |
|---|
| Primary Goal | Ease of editing for humans (Config files). | Token efficiency for LLMs (API costs). |
| Keys | Can be unquoted (if valid JS identifier). | Removed entirely in tabular arrays; unquoted in objects. |
| Arrays | Standard list [{}, {}]. | Tabular with headers (massive token savings). |
| Hierarchy | Braces { }. | Indentation (YAML-style). |
| Strings | Single ' or Double "; multi-line supported. | No quotes needed (unless containing delimiters). |
| Comments | // and /* */ (Supported). | # (Supported). |
| Token Cost | High. (Repeats keys, uses braces). | Low. (30–50% fewer tokens than JSON). |
3. Deep Dive: Why they are different
The "Repeated Key" Problem
- JSON5 fails to solve the biggest inefficiency of JSON: repeating key names. If you have a list of 1,000 users, JSON5 still requires you to write
name: 1,000 times. - TOON solves this by defining
name once in the header. For large datasets, this makes TOON significantly smaller.
The "Parsing" Problem
- JSON5 is a strict subset of JavaScript. If you are a web developer, you already know it. It is perfect for VS Code settings or project config files.
- TOON requires a specialized parser. It is not native to browsers or Node.js. It is strictly a data-interchange format for feeding context to AI agents.
4. When to use which?
Use JSON5 when:
- You are writing a configuration file (e.g.,
.eslintrc, tsconfig). - Humans need to edit the file manually and frequently.
- You need to "comment out" lines of data for testing.
- File size and token count are irrelevant (processing is local).
Use TOON when:
- You are sending data to GPT-4, Claude, or Llama.
- You have large lists of uniform data (SQL results, logs, product inventories).
- You are paying for API usage per token and want to cut costs by ~40%.
- You need the AI to process data faster (fewer tokens = lower latency).