Performance characteristics, optimization strategies, and best practices for using TOON efficiently in production
TOON is designed for token efficiency, but performance considerations extend beyond token count. This guide covers latency, throughput, memory usage, and optimization strategies for production use.
TOON reduces token count significantly (35-60% vs JSON for uniform arrays), but encoding/decoding has computational overhead:
Encoding: Analyzing array uniformity and building tabular structures
Decoding: Parsing headers, validating lengths, and reconstructing objects
For latency-critical applications, measure end-to-end response time on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite higher token count.
For cloud LLM APIs (OpenAI, Anthropic, etc.), TOON almost always wins due to token-based pricing. For local models, test both formats with your specific model.
TOON encoding builds the full output string in memory:
import { encode } from '@toon-format/toon'// ⚠️ Loads entire output into memoryconst toonString = encode(largeData)
For large datasets, use streaming encoding:
import { encodeLines } from '@toon-format/toon'// ✅ Memory-efficient: yields lines one at a timefor (const line of encodeLines(largeData)) { process.stdout.write(`${line}\n`) // or: await writeToFile(line)}
TOON decoding builds the full object tree in memory:
import { decode } from '@toon-format/toon'// ⚠️ Loads entire object tree into memoryconst data = decode(toonString)
For large datasets, use streaming decoding:
import { decodeStream } from '@toon-format/toon'// ✅ Memory-efficient: processes events one at a timefor await (const event of decodeStream(lines)) { if (event.type === 'primitive') { console.log(event.value) } // Custom processing without building full tree}
Streaming Decode Event Types
The decodeStream API yields structured JSON events:
{ type: 'startObject' } - Object begins
{ type: 'endObject' } - Object ends
{ type: 'startArray', length: number } - Array begins (length from [N])
import { encodeLines, decodeStream } from '@toon-format/toon'// Streaming encode (thousands of records)const largeData = await fetchThousandsOfRecords()for (const line of encodeLines(largeData)) { await writeToLLMStream(line)}// Streaming decode (large LLM responses)const lines = splitLLMResponseToLines(response)for await (const event of decodeStream(lines)) { // Process events without building full tree if (event.type === 'primitive') { await processValue(event.value) }}