When to Use TOON

TOON excels in specific scenarios and falls short in others. This guide helps you choose the right format for your use case.

TOON’s Sweet Spot

TOON achieves maximum efficiency with uniform arrays of objects—data with the same structure across items:

employees[100]{id,name,email,department,salary,yearsExperience,active}:
  1,Alice Johnson,[email protected],Engineering,95000,5,true
  2,Bob Smith,[email protected],Sales,72000,3,true
  3,Carol White,[email protected],Engineering,88000,7,false
  # ... 97 more rows

Uniform Structure

All objects have identical fields with primitive values—perfect for tabular format

Token Efficiency

Field names declared once instead of repeated 100 times—massive token savings

LLM Validation

[100] length and {fields} header help models detect truncation and validate structure

CSV-like Compactness

Approaches CSV efficiency while remaining fully lossless JSON

Ideal Use Cases

1. Tabular Data for LLMs

When sending structured data to LLMs for analysis, search, or question-answering:

Analytics Data

metrics[60]{date,views,clicks,conversions,revenue,bounceRate}:
  2025-01-01,6138,174,12,2712.49,0.35
  2025-01-02,4616,274,34,9156.29,0.56
  2025-01-03,4460,143,8,1317.98,0.59
  # ... 57 more rows

Token savings: 59.0% reduction vs JSON (9,115 tokens vs 22,245 tokens)Benefits:

LLMs can easily aggregate metrics (sum revenue, count high-bounce days)
[60] length enables “How many days?” queries without counting
Tabular format improves parsing accuracy for numeric operations

Database Query Results

users[250]{id,username,email,createdAt,lastLogin,status}:
  1,alice_j,[email protected],2023-01-15T10:30:00Z,2025-03-05T14:22:00Z,active
  2,bob_smith,[email protected],2023-02-20T08:15:00Z,2025-03-06T09:10:00Z,active
  # ... 248 more rows

Use case: Sending query results to LLMs for data analysisBenefits:

Massive token reduction for large result sets
Preserves type information (timestamps, booleans, nulls)
Array length helps models understand dataset size

API Responses

repositories[100]{id,name,stars,watchers,forks,language,updatedAt}:
  28457823,freeCodeCamp,430886,8583,42146,JavaScript,2025-10-28T11:58:08Z
  132750724,build-your-own-x,430877,6332,40453,null,2025-10-28T12:37:11Z
  # ... 98 more rows

Token savings: 42.3% reduction vs JSON (8,744 tokens vs 15,144 tokens)Benefits:

Compact representation of API paginated results
Handles mixed data types (numbers, strings, null)
Easy for LLMs to answer comparative queries (“Which repo has most stars?“)

2. Mixed-Structure Documents

Data with both nested objects and tabular arrays:

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true

Format flexibility: TOON automatically chooses the most efficient representation for each data structure—YAML-style indentation for objects, inline for primitive arrays, tabular for uniform object arrays.

3. LLM-Generated Structured Output

When asking LLMs to generate structured data:

Why TOON Helps

Explicit structure: [N] lengths and {fields} headers guide model output
Validation: You can detect truncation by checking actual vs declared array length
Parsing reliability: Tabular format reduces ambiguity compared to freeform JSON
Token efficiency: Models generate fewer tokens for the same information

Example prompt:

Generate a list of 5 book recommendations in TOON format:

books[5]{title,author,year,genre,rating}:

The format header serves as both instruction and schema.

4. RAG (Retrieval-Augmented Generation)

For retrieval systems that inject data into prompts:

Token budget optimization: More data fits in context window
Structure preservation: Full JSON data model support
LLM comprehension: 76.4% accuracy vs JSON’s 75.0% in benchmarks

When NOT to Use TOON

TOON is not always the best choice. Consider alternatives in these scenarios:

1. Deeply Nested or Non-Uniform Structures

Problem: Data with minimal tabular eligibility (≈0%) often uses more tokens in TOON than compact JSON.

Example: Deeply Nested Config

{
  "server": {
    "http": {
      "port": 8080,
      "timeout": 30,
      "cors": {
        "enabled": true,
        "origins": ["*"]
      }
    },
    "database": {
      "primary": {
        "host": "localhost",
        "port": 5432
      }
    }
  }
}

Token comparison (from benchmarks):

JSON compact: 558 tokens
TOON: 620 tokens (+11.1%)
JSON pretty: 911 tokens

Recommendation: Use JSON compact for deeply nested configuration objects.

When This Happens

Complex configuration files with many nested levels
Tree structures (file systems, org charts)
Recursive data structures
Objects with highly variable field sets

Rule of thumb: If tabular eligibility is near 0%, try JSON compact first.

2. Semi-Uniform Arrays

Problem: Arrays with ~40–60% tabular eligibility show diminishing returns. Token savings may not justify switching formats.

Example: Event Logs with Mixed Structure

events[4]:
  - timestamp: 2025-03-06T10:30:00Z
    type: login
    userId: 123
  - timestamp: 2025-03-06T10:31:00Z
    type: error
    message: Connection timeout
    error:
      code: ETIMEOUT
      stack: ...
  - timestamp: 2025-03-06T10:32:00Z
    type: login
    userId: 456
  - timestamp: 2025-03-06T10:33:00Z
    type: pageview
    path: /dashboard
    referrer: null

Token comparison (from benchmarks):

JSON compact: 128,529 tokens
TOON: 154,084 tokens (+19.9%)

Half the events are simple, half have nested errors—TOON can’t use tabular format, so list format adds overhead.Recommendation: Use JSON compact when uniformity is below ~60%.

3. Pure Tabular Data (CSV Territory)

Context: CSV is smaller than TOON for flat tables. TOON adds ~5-10% overhead for structural features that improve LLM reliability.

Token Comparison: Employee Records

CSV (47,102 tokens):

id,name,email,department,salary,yearsExperience,active
1,Alice Johnson,[email protected],Engineering,95000,5,true
2,Bob Smith,[email protected],Sales,72000,3,true
# ... 98 more rows

TOON (49,919 tokens, +6.0%):

employees[100]{id,name,email,department,salary,yearsExperience,active}:
  1,Alice Johnson,[email protected],Engineering,95000,5,true
  2,Bob Smith,[email protected],Sales,72000,3,true
  # ... 98 more rows

What TOON adds:

Array length declaration: [100]
Key prefix: employees
Delimiter scoping in header

When to choose CSV:

Pure tabular data with no nesting
Token budget is extremely tight
LLMs already understand your CSV schema

When to choose TOON:

Need structural validation ([N] length checking)
Data includes nested objects or multiple arrays
Want lossless JSON compatibility

4. Latency-Critical Applications

Problem: Some model deployments (especially local/quantized models like Ollama) may process compact JSON faster than TOON despite lower token counts.

Factors Affecting Latency

Time-to-First-Token (TTFT):

TOON’s lower token count may not always mean faster TTFT
Model tokenizer efficiency varies by format
Local models may optimize for common JSON patterns

Tokens per Second (t/s):

Generation speed depends on model implementation
Some models may parse JSON more efficiently

Total Time:

total = TTFT + (tokens / t/s)
Measure both components for your exact setup

Recommendation: Benchmark on your actual deployment—measure TTFT, t/s, and total time for both formats and use whichever is faster.

When Latency Matters Most

Real-time user-facing applications
High-throughput batch processing
Edge deployments with constrained resources
Applications where milliseconds count (trading, monitoring)

Action items:

Profile both TOON and JSON compact in your environment
Test with representative data samples
Measure across different model sizes/quantization levels
Choose the format that performs better for your specific setup

Decision Framework

Use this flowchart to choose the right format:

Tabular Eligibility

Tabular eligibility measures what percentage of your data can use TOON’s efficient tabular format:

100% Eligible

All arrays are uniform objects with primitive values—maximum token savings

60-80% Eligible

Most arrays are tabular—significant token savings

40-60% Eligible

Mixed structure—modest token savings, evaluate tradeoffs

0-40% Eligible

Minimal tabular data—JSON compact likely more efficient

Calculating Eligibility

An array is tabular-eligible when:

All elements are objects
All objects have identical field sets
All values are primitives (no nested objects/arrays)

Calculate for your dataset:

eligibleArrays / totalArrays = tabularEligibility

Example:

5 arrays total
3 are tabular-eligible
Eligibility = 3/5 = 60%

Benchmark results show:

100% eligibility: ~60% token reduction vs JSON
50% eligibility: ~15% token reduction vs JSON
0% eligibility: +10% tokens vs JSON compact (overhead from structure)

Real-World Benchmarks

Token Efficiency by Structure

From TOON’s comprehensive benchmarks:

Dataset	Structure	Eligibility	TOON vs JSON	TOON vs JSON Compact
Employee records	Uniform	100%	−60.7%	−36.9%
Time-series analytics	Uniform	100%	−59.0%	−35.9%
GitHub repositories	Uniform	100%	−42.3%	−23.7%
E-commerce orders	Nested	33%	−33.3%	+5.3%
Event logs	Semi-uniform	50%	−15.0%	+19.9%
Nested config	Deep	0%	−31.9%	+11.1%

Key insight: TOON excels at 80%+ tabular eligibility, becomes marginal at 40-60%, and may add overhead below 40%.

Accuracy Comparison

LLM retrieval accuracy across 209 questions on 4 models:

TOON           76.4% accuracy  │  2,759 tokens  │  27.7 acc%/1K tok
JSON           75.0% accuracy  │  4,587 tokens  │  16.4 acc%/1K tok
JSON compact   73.7% accuracy  │  3,104 tokens  │  23.7 acc%/1K tok
YAML           74.5% accuracy  │  3,749 tokens  │  19.9 acc%/1K tok

TOON achieves slightly better accuracy (76.4% vs 75.0%) while using 39.9% fewer tokens than JSON.

Quick Reference

Use TOON When...

Uniform arrays of objects (80%+ tabular eligibility)
Sending structured data to LLMs
Token budget is a concern
Need validation guardrails ([N] lengths, {fields} headers)
Want lossless JSON compatibility with better efficiency

Use JSON Compact When...

Deeply nested structures (0-40% tabular eligibility)
Non-uniform data with variable fields
Already have JSON pipelines
Latency benchmarks favor JSON in your environment

Use CSV When...

Pure flat tables with no nesting
Token budget is extremely tight
Don’t need structural validation
LLMs already understand your CSV schema

Benchmark When...

Latency is critical
Tabular eligibility is 40-60% (marginal case)
Deploying to local/quantized models
Unsure which format fits your use case

Try It Yourself

Playground

Convert your JSON to TOON and compare token counts

Benchmarks

See detailed comparisons across data structures

Quick Start

Install the library and test with your data

Still unsure? Start with the playground to see token counts for your actual data, then consult the benchmarks for detailed performance analysis.

Get Started

Core Concepts

Usage Guides

Advanced

When to Use TOON

When to Use TOON

TOON’s Sweet Spot

Uniform Structure

Token Efficiency

LLM Validation

CSV-like Compactness

Ideal Use Cases

1. Tabular Data for LLMs

2. Mixed-Structure Documents

3. LLM-Generated Structured Output

4. RAG (Retrieval-Augmented Generation)

When NOT to Use TOON

1. Deeply Nested or Non-Uniform Structures

2. Semi-Uniform Arrays

3. Pure Tabular Data (CSV Territory)

4. Latency-Critical Applications

Decision Framework

Tabular Eligibility

100% Eligible

60-80% Eligible

40-60% Eligible

0-40% Eligible

Real-World Benchmarks

Token Efficiency by Structure

Accuracy Comparison

Quick Reference

Use TOON When...

Use JSON Compact When...

Use CSV When...

Benchmark When...

Try It Yourself

Playground

Benchmarks

Quick Start

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Advanced

​When to Use TOON

​TOON’s Sweet Spot

Uniform Structure

Token Efficiency

LLM Validation

CSV-like Compactness

​Ideal Use Cases

​1. Tabular Data for LLMs

​2. Mixed-Structure Documents

​3. LLM-Generated Structured Output

​4. RAG (Retrieval-Augmented Generation)

​When NOT to Use TOON

​1. Deeply Nested or Non-Uniform Structures

​2. Semi-Uniform Arrays

​3. Pure Tabular Data (CSV Territory)

​4. Latency-Critical Applications

​Decision Framework

​Tabular Eligibility

100% Eligible

60-80% Eligible

40-60% Eligible

0-40% Eligible

​Real-World Benchmarks

​Token Efficiency by Structure

​Accuracy Comparison

​Quick Reference

Use TOON When...

Use JSON Compact When...

Use CSV When...

Benchmark When...

​Try It Yourself

Playground

Benchmarks

Quick Start

Build docs developers (and LLMs) love

When to Use TOON

TOON’s Sweet Spot

Ideal Use Cases

1. Tabular Data for LLMs

2. Mixed-Structure Documents

3. LLM-Generated Structured Output

4. RAG (Retrieval-Augmented Generation)

When NOT to Use TOON

1. Deeply Nested or Non-Uniform Structures

2. Semi-Uniform Arrays

3. Pure Tabular Data (CSV Territory)

4. Latency-Critical Applications

Decision Framework

Tabular Eligibility

Real-World Benchmarks

Token Efficiency by Structure

Accuracy Comparison

Quick Reference

Try It Yourself