Skip to main content

What is TOON?

Token-Oriented Object Notation (TOON) is a compact, human-readable encoding of the JSON data model that minimizes tokens and makes structure easy for models to follow. It’s intended for LLM input as a drop-in, lossless representation of your existing JSON.

The Problem: JSON is Token-Expensive

AI is becoming cheaper and more accessible, but LLM tokens still cost money. Standard JSON is verbose and repeats field names for every record:
{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}
For a dataset with 100 users, you’d repeat "id":, "name":, and "role": 100 times. This redundancy adds up quickly in larger datasets.

The TOON Solution: Declare Once, Stream Compact

TOON eliminates this redundancy by declaring structure once and streaming data as rows:
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Field Declaration

The {id,name,role} header declares field names once for the entire array

Length Validation

The [2] declares array length, helping LLMs detect truncation

Compact Rows

Each row is just comma-separated values—no repeated keys

CSV-like Efficiency

Achieves CSV-like compactness while remaining fully lossless JSON

How TOON Combines Formats

TOON strategically combines the best aspects of multiple formats:
Nested objects use indentation instead of braces, eliminating { and } tokens:
context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
Keys are followed by : (colon-space), and values can be unquoted when safe.
Arrays of objects with identical fields collapse into tabular format:
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
  1,Blue Lake Trail,7.5,320,ana,true
  2,Ridge Overlook,9.2,540,luis,false
  3,Wildflower Loop,5.1,180,sam,true
This achieves CSV-like compactness while adding explicit structure that improves LLM reliability.
Simple arrays of strings, numbers, or booleans render inline:
friends[3]: ana,luis,sam
The [3] length declaration provides validation, and values are comma-separated by default.

Real-World Example

Here’s how TOON handles mixed data structures:
{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {
      "id": 1,
      "name": "Blue Lake Trail",
      "distanceKm": 7.5,
      "elevationGain": 320,
      "companion": "ana",
      "wasSunny": true
    },
    {
      "id": 2,
      "name": "Ridge Overlook",
      "distanceKm": 9.2,
      "elevationGain": 540,
      "companion": "luis",
      "wasSunny": false
    },
    {
      "id": 3,
      "name": "Wildflower Loop",
      "distanceKm": 5.1,
      "elevationGain": 180,
      "companion": "sam",
      "wasSunny": true
    }
  ]
}
55% token reduction: TOON uses 106 tokens vs JSON’s 235 tokens—a savings of 129 tokens (55%) for this example. The format automatically chooses the most efficient representation for each data structure.

Key Features

Token-Efficient & Accurate

TOON reaches 76.4% accuracy (vs JSON’s 75.0%) while using ~40% fewer tokens in mixed-structure benchmarks across 4 models

JSON Data Model

Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips

LLM-Friendly Guardrails

Explicit [N] lengths and {fields} headers give models a clear schema to follow, improving parsing reliability

Minimal Syntax

Uses indentation instead of braces and minimizes quoting, giving YAML-like readability with CSV-style compactness

Tabular Arrays

Uniform arrays of objects collapse into tables that declare fields once and stream row values line by line

Multi-Language Ecosystem

Spec-driven implementations in TypeScript, Python, Go, Rust, .NET, and other languages

Design Philosophy

TOON’s design prioritizes specific goals:
  1. Make uniform arrays compact — Declare structure once, stream data efficiently
  2. Stay fully lossless — Round-trips preserve all data: decode(encode(x)) === x
  3. Keep parsing simple — Explicit structure markers help both LLMs and humans
  4. Provide validation guardrails — Array lengths and field counts detect truncation and malformed output
Think of TOON as a translation layer: use JSON programmatically, and encode it as TOON for LLM input. The format is stable but evolving—help shape it by contributing to the spec.

How It Works

TOON automatically detects array structure and chooses the most efficient representation:
import { encode } from '@toon-format/toon'

const data = {
  users: [
    { id: 1, name: 'Alice', role: 'admin' },
    { id: 2, name: 'Bob', role: 'user' }
  ]
}

console.log(encode(data))
// users[2]{id,name,role}:
//   1,Alice,admin
//   2,Bob,user
The encoder:
  1. Detects that users is an array of objects
  2. Verifies all objects have identical fields (id, name, role)
  3. Confirms all values are primitives (no nested objects/arrays)
  4. Chooses tabular format and generates the header
  5. Streams each object as a comma-separated row
Lossless round-trips: Decoding TOON back to JSON recovers the exact original structure. See Format Overview for complete syntax details.

What Makes TOON Different?

vs JSON

~40% fewer tokens for mixed-structure data. TOON eliminates redundant field names in arrays while staying lossless.

vs YAML

More compact for arrays. YAML uses - key: value for each field. TOON uses tabular format to eliminate repetition.

vs CSV

Supports nested structures. CSV only handles flat tables. TOON adds minimal overhead (~5-10%) while supporting full JSON.

vs XML

Dramatically more compact. XML’s verbose tags make it the least token-efficient format for most data structures.

Media Type & File Extension

By convention, TOON files use:
  • File extension: .toon
  • Media type: text/toon
  • Encoding: UTF-8 (always)
The charset parameter may be specified (text/toon; charset=utf-8) but defaults to UTF-8 when omitted.

Next Steps

Format Overview

Learn the complete TOON syntax with detailed examples

When to Use TOON

Understand when TOON excels and when to use alternatives

Quick Start

Install the library and start encoding your first dataset

Benchmarks

See detailed performance comparisons across formats

Build docs developers (and LLMs) love