What is TOON?

Token-Oriented Object Notation (TOON) is a compact, human-readable encoding of the JSON data model that minimizes tokens and makes structure easy for models to follow. It’s intended for LLM input as a drop-in, lossless representation of your existing JSON.

The Problem: JSON is Token-Expensive

AI is becoming cheaper and more accessible, but LLM tokens still cost money. Standard JSON is verbose and repeats field names for every record:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

For a dataset with 100 users, you’d repeat "id":, "name":, and "role": 100 times. This redundancy adds up quickly in larger datasets.

The TOON Solution: Declare Once, Stream Compact

TOON eliminates this redundancy by declaring structure once and streaming data as rows:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Field Declaration

The {id,name,role} header declares field names once for the entire array

Length Validation

The [2] declares array length, helping LLMs detect truncation

Compact Rows

Each row is just comma-separated values—no repeated keys

CSV-like Efficiency

Achieves CSV-like compactness while remaining fully lossless JSON

How TOON Combines Formats

TOON strategically combines the best aspects of multiple formats:

YAML-Style Indentation for Objects

Nested objects use indentation instead of braces, eliminating { and } tokens:

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025

Keys are followed by : (colon-space), and values can be unquoted when safe.

CSV-Style Tables for Uniform Arrays

Arrays of objects with identical fields collapse into tabular format:

hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true

This achieves CSV-like compactness while adding explicit structure that improves LLM reliability.

Inline Format for Primitive Arrays

Simple arrays of strings, numbers, or booleans render inline:

friends[3]: ana,luis,sam

The [3] length declaration provides validation, and values are comma-separated by default.

Real-World Example

Here’s how TOON handles mixed data structures:

{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {
      "id": 1,
      "name": "Blue Lake Trail",
      "distanceKm": 7.5,
      "elevationGain": 320,
      "companion": "ana",
      "wasSunny": true
    },
    {
      "id": 2,
      "name": "Ridge Overlook",
      "distanceKm": 9.2,
      "elevationGain": 540,
      "companion": "luis",
      "wasSunny": false
    },
    {
      "id": 3,
      "name": "Wildflower Loop",
      "distanceKm": 5.1,
      "elevationGain": 180,
      "companion": "sam",
      "wasSunny": true
    }
  ]
}

55% token reduction: TOON uses 106 tokens vs JSON’s 235 tokens—a savings of 129 tokens (55%) for this example. The format automatically chooses the most efficient representation for each data structure.

Key Features

Token-Efficient & Accurate

TOON reaches 76.4% accuracy (vs JSON’s 75.0%) while using ~40% fewer tokens in mixed-structure benchmarks across 4 models

JSON Data Model

Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips

LLM-Friendly Guardrails

Explicit [N] lengths and {fields} headers give models a clear schema to follow, improving parsing reliability

Minimal Syntax

Uses indentation instead of braces and minimizes quoting, giving YAML-like readability with CSV-style compactness

Tabular Arrays

Uniform arrays of objects collapse into tables that declare fields once and stream row values line by line

Multi-Language Ecosystem

Spec-driven implementations in TypeScript, Python, Go, Rust, .NET, and other languages

Design Philosophy

TOON’s design prioritizes specific goals:

Make uniform arrays compact — Declare structure once, stream data efficiently
Stay fully lossless — Round-trips preserve all data: decode(encode(x)) === x
Keep parsing simple — Explicit structure markers help both LLMs and humans
Provide validation guardrails — Array lengths and field counts detect truncation and malformed output

Think of TOON as a translation layer: use JSON programmatically, and encode it as TOON for LLM input. The format is stable but evolving—help shape it by contributing to the spec.

How It Works

TOON automatically detects array structure and chooses the most efficient representation:

import { encode } from '@toon-format/toon'

const data = {
  users: [
    { id: 1, name: 'Alice', role: 'admin' },
    { id: 2, name: 'Bob', role: 'user' }
  ]
}

console.log(encode(data))
// users[2]{id,name,role}:
//   1,Alice,admin
//   2,Bob,user

The encoder:

Detects that users is an array of objects
Verifies all objects have identical fields (id, name, role)
Confirms all values are primitives (no nested objects/arrays)
Chooses tabular format and generates the header
Streams each object as a comma-separated row

Lossless round-trips: Decoding TOON back to JSON recovers the exact original structure. See Format Overview for complete syntax details.

What Makes TOON Different?

vs JSON

~40% fewer tokens for mixed-structure data. TOON eliminates redundant field names in arrays while staying lossless.

vs YAML

More compact for arrays. YAML uses - key: value for each field. TOON uses tabular format to eliminate repetition.

vs CSV

Supports nested structures. CSV only handles flat tables. TOON adds minimal overhead (~5-10%) while supporting full JSON.

vs XML

Dramatically more compact. XML’s verbose tags make it the least token-efficient format for most data structures.

Media Type & File Extension

By convention, TOON files use:

File extension: .toon
Media type: text/toon
Encoding: UTF-8 (always)

The charset parameter may be specified (text/toon; charset=utf-8) but defaults to UTF-8 when omitted.

Next Steps

Format Overview

Learn the complete TOON syntax with detailed examples

When to Use TOON

Understand when TOON excels and when to use alternatives

Quick Start

Install the library and start encoding your first dataset

Benchmarks

See detailed performance comparisons across formats

Get Started

Core Concepts

Usage Guides

Advanced

What is TOON?

What is TOON?

The Problem: JSON is Token-Expensive

The TOON Solution: Declare Once, Stream Compact

Field Declaration

Length Validation

Compact Rows

CSV-like Efficiency

How TOON Combines Formats

Real-World Example

Key Features

Token-Efficient & Accurate

JSON Data Model

LLM-Friendly Guardrails

Minimal Syntax

Tabular Arrays

Multi-Language Ecosystem

Design Philosophy

How It Works

What Makes TOON Different?

vs JSON

vs YAML

vs CSV

vs XML

Media Type & File Extension

Next Steps

Format Overview

When to Use TOON

Quick Start

Benchmarks

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Advanced

​What is TOON?

​The Problem: JSON is Token-Expensive

​The TOON Solution: Declare Once, Stream Compact

Field Declaration

Length Validation

Compact Rows

CSV-like Efficiency

​How TOON Combines Formats

​Real-World Example

​Key Features

Token-Efficient & Accurate

JSON Data Model

LLM-Friendly Guardrails

Minimal Syntax

Tabular Arrays

Multi-Language Ecosystem

​Design Philosophy

​How It Works

​What Makes TOON Different?

vs JSON

vs YAML

vs CSV

vs XML

​Media Type & File Extension

​Next Steps

Format Overview

When to Use TOON

Quick Start

Benchmarks

Build docs developers (and LLMs) love

What is TOON?

The Problem: JSON is Token-Expensive

The TOON Solution: Declare Once, Stream Compact

How TOON Combines Formats

Real-World Example

Key Features

Design Philosophy

How It Works

What Makes TOON Different?

Media Type & File Extension

Next Steps