Skip to main content
TOON is designed for token efficiency, but performance considerations extend beyond token count. This guide covers latency, throughput, memory usage, and optimization strategies for production use.

Performance Characteristics

Token Efficiency vs. Processing Speed

TOON reduces token count significantly (35-60% vs JSON for uniform arrays), but encoding/decoding has computational overhead:
  • Encoding: Analyzing array uniformity and building tabular structures
  • Decoding: Parsing headers, validating lengths, and reconstructing objects
For latency-critical applications, measure end-to-end response time on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite higher token count.

Performance Trade-offs

The optimal format depends on your bottleneck:
BottleneckBest FormatReason
API costsTOONMinimizes tokens ($$)
Context limitsTOONFits more data in window
Network transferJSON compactSmaller bytes over wire
Parsing speedJSONSimpler structure
Local inferenceBenchmark bothToken/sec varies by model
For cloud LLM APIs (OpenAI, Anthropic, etc.), TOON almost always wins due to token-based pricing. For local models, test both formats with your specific model.

Memory Usage

Encoding

TOON encoding builds the full output string in memory:
import { encode } from '@toon-format/toon'

// ⚠️ Loads entire output into memory
const toonString = encode(largeData)
For large datasets, use streaming encoding:
import { encodeLines } from '@toon-format/toon'

// ✅ Memory-efficient: yields lines one at a time
for (const line of encodeLines(largeData)) {
  process.stdout.write(`${line}\n`)
  // or: await writeToFile(line)
}

Decoding

TOON decoding builds the full object tree in memory:
import { decode } from '@toon-format/toon'

// ⚠️ Loads entire object tree into memory
const data = decode(toonString)
For large datasets, use streaming decoding:
import { decodeStream } from '@toon-format/toon'

// ✅ Memory-efficient: processes events one at a time
for await (const event of decodeStream(lines)) {
  if (event.type === 'primitive') {
    console.log(event.value)
  }
  // Custom processing without building full tree
}
The decodeStream API yields structured JSON events:
  • { type: 'startObject' } - Object begins
  • { type: 'endObject' } - Object ends
  • { type: 'startArray', length: number } - Array begins (length from [N])
  • { type: 'endArray' } - Array ends
  • { type: 'key', key: string, wasQuoted?: boolean } - Object key
  • { type: 'primitive', value: JsonPrimitive } - Primitive value
See the API Reference for details.

Optimization Strategies

1. Use Tab Delimiters for Maximum Efficiency

Comma is the default delimiter, but tab is more token-efficient:
import { encode, DELIMITERS } from '@toon-format/toon'

const data = { users: [{ id: 1, name: 'Alice' }, { id: 2, name: 'Bob' }] }

// Comma delimiter (default)
const commaEncoded = encode(data)
// users[2]{id,name}:
//   1,Alice
//   2,Bob

// Tab delimiter (more efficient)
const tabEncoded = encode(data, { delimiter: DELIMITERS.tab })
// users[2]{id\tname}:
//   1\tAlice
//   2\tBob
Tab saves ~5-10% tokens on tabular arrays. Use it for production LLM inputs where you control encoding and don’t need human readability.

2. Enable Key Folding for Nested Objects

Key folding collapses single-key wrapper chains into dotted paths:
import { encode } from '@toon-format/toon'

const data = {
  api: {
    config: {
      timeout: 5000
    }
  }
}

// Without key folding (default)
encode(data)
// api:
//   config:
//     timeout: 5000

// With key folding
encode(data, { keyFolding: 'safe' })
// api.config.timeout: 5000
Savings: ~30-40% tokens on deeply nested configuration objects.
keyFolding: 'safe' only folds when all segments are valid identifiers. Set expandPaths: 'safe' when decoding to reconstruct the original structure.

3. Use Streaming for Large Datasets

Streaming encoding/decoding prevents memory spikes:
import { encodeLines, decodeStream } from '@toon-format/toon'

// Streaming encode (thousands of records)
const largeData = await fetchThousandsOfRecords()
for (const line of encodeLines(largeData)) {
  await writeToLLMStream(line)
}

// Streaming decode (large LLM responses)
const lines = splitLLMResponseToLines(response)
for await (const event of decodeStream(lines)) {
  // Process events without building full tree
  if (event.type === 'primitive') {
    await processValue(event.value)
  }
}

4. Control Indentation for Compact Output

TOON defaults to 2-space indentation. Use 0 spaces for maximum compactness:
import { encode } from '@toon-format/toon'

const data = {
  users: [
    { id: 1, name: 'Alice' },
    { id: 2, name: 'Bob' }
  ]
}

// Default (2 spaces)
encode(data)
// users[2]{id,name}:
//   1,Alice
//   2,Bob

// Compact (0 spaces)
encode(data, { indent: 0 })
// users[2]{id,name}:
// 1,Alice
// 2,Bob
Savings: ~3-5% tokens on deeply nested structures.
Zero indentation reduces readability. Use it only when token count is critical and human inspection isn’t needed.

5. Filter Data with Replacer Function

Remove unnecessary fields before encoding to reduce token count:
import { encode } from '@toon-format/toon'

const users = [
  { id: 1, name: 'Alice', password: 'secret', internal_id: 'x123' },
  { id: 2, name: 'Bob', password: 'secret', internal_id: 'x456' }
]

// Remove sensitive/unnecessary fields
const safe = encode({ users }, {
  replacer: (key, value) => {
    if (key === 'password' || key === 'internal_id') {
      return undefined // Omit field
    }
    return value
  }
})
// users[2]{id,name}:
//   1,Alice
//   2,Bob
See Replacer Function for more examples.

Latency Considerations

Time to First Token (TTFT)

TOON’s structured headers may slightly increase TTFT:
  • JSON: Model starts generating immediately
  • TOON: Model must emit array header ([N]{fields}:) before first row
Impact is minimal (typically less than 50ms). The token savings (35-60%) far outweigh the header overhead for most use cases.

Tokens per Second

Some models process simpler structures faster:
ScenarioRecommendation
Cloud APIsUse TOON (token-based pricing)
Local modelsBenchmark both formats
Streaming outputTest TTFT and tokens/sec
# Benchmark example (pseudo-code)
time curl -X POST api/generate -d '{"prompt": "...", "format": "toon"}'
time curl -X POST api/generate -d '{"prompt": "...", "format": "json"}'

Production Best Practices

1. Cache Encoded Outputs

If encoding the same data repeatedly, cache the TOON output:
const cache = new Map<string, string>()

function getEncodedData(key: string, data: unknown): string {
  if (!cache.has(key)) {
    cache.set(key, encode(data, { delimiter: DELIMITERS.tab }))
  }
  return cache.get(key)!
}

2. Validate Input Data

TOON encoding assumes valid JSON data model. Validate inputs to prevent encoding errors:
import { encode } from '@toon-format/toon'
import { z } from 'zod'

const UserSchema = z.object({
  id: z.number(),
  name: z.string(),
  email: z.string().email()
})

function encodeUsers(users: unknown[]): string {
  const validated = users.map(u => UserSchema.parse(u))
  return encode({ users: validated })
}

3. Monitor Token Usage

Track token savings in production:
import { encode as encodeTOON } from '@toon-format/toon'
import { encode as countTokens } from 'gpt-tokenizer'

function logTokenSavings(data: unknown) {
  const jsonStr = JSON.stringify(data, null, 2)
  const toonStr = encodeTOON(data)
  
  const jsonTokens = countTokens(jsonStr).length
  const toonTokens = countTokens(toonStr).length
  const savings = ((jsonTokens - toonTokens) / jsonTokens * 100).toFixed(1)
  
  console.log(`Token savings: ${savings}% (${jsonTokens}${toonTokens})`)
}

4. Use Strict Mode for Production

Strict mode validates array lengths and tabular row counts during decoding:
import { decode } from '@toon-format/toon'

// Strict mode (default) - validates [N] lengths
const data = decode(toonString, { strict: true })

// Non-strict mode - skips validation
const data = decode(toonString, { strict: false })
Use strict: true (default) in production to catch corrupted or truncated data. Disable only for testing or when processing known-good inputs.

5. Handle Decode Errors Gracefully

Wrap decode calls in try-catch for production:
import { decode } from '@toon-format/toon'

function safeDecode(toonString: string): unknown | null {
  try {
    return decode(toonString, { strict: true })
  } catch (error) {
    console.error('TOON decode failed:', error)
    // Fallback to JSON or return null
    return null
  }
}

Performance Checklist

  • Use tab delimiter for production (delimiter: DELIMITERS.tab)
  • Enable key folding for nested configs (keyFolding: 'safe')
  • Use streaming for large datasets (encodeLines())
  • Filter unnecessary fields with replacer function
  • Cache encoded outputs when possible
  • Reduce indentation if tokens are critical (indent: 0)
  • Use streaming for large inputs (decodeStream())
  • Enable strict mode in production (strict: true)
  • Handle decode errors gracefully (try-catch)
  • Enable path expansion if using key folding (expandPaths: 'safe')
  • Track token savings (TOON vs JSON)
  • Monitor encoding/decoding latency
  • Measure end-to-end response time
  • Validate memory usage with large datasets
  • Test with your specific model/tokenizer

Benchmarking Your Use Case

Every use case is different. Benchmark with your actual data:
import { encode as encodeTOON, decode as decodeTOON } from '@toon-format/toon'
import { encode as countTokens } from 'gpt-tokenizer'

function benchmark(data: unknown) {
  // Encoding performance
  const encodeStart = performance.now()
  const toonStr = encodeTOON(data, { delimiter: DELIMITERS.tab })
  const encodeTime = performance.now() - encodeStart
  
  // Token count
  const jsonTokens = countTokens(JSON.stringify(data, null, 2)).length
  const toonTokens = countTokens(toonStr).length
  
  // Decoding performance
  const decodeStart = performance.now()
  const decoded = decodeTOON(toonStr)
  const decodeTime = performance.now() - decodeStart
  
  console.log({
    encodeTime: `${encodeTime.toFixed(2)}ms`,
    decodeTime: `${decodeTime.toFixed(2)}ms`,
    jsonTokens,
    toonTokens,
    savings: `${((jsonTokens - toonTokens) / jsonTokens * 100).toFixed(1)}%`
  })
}

benchmark(yourData)
Use the TOON Playground to quickly compare token counts for your data without writing code.

Build docs developers (and LLMs) love