Skip to main content

GLYPH Benchmark Results

Comprehensive comparison of GLYPH against JSON, ZON, and TOON serialization formats for LLM applications.

Executive Summary

Token Savings

48% reduction with GLYPH+Pool5% reduction with standard GLYPH

Size Reduction

60% smaller with GLYPH+Pool45% smaller with standard GLYPH

Retrieval Accuracy

100% accuracy on large modelsMatches JSON performance

Streaming Support

33% faster validationEarly tool detection at 50%

Codec Comparison Matrix

CodecDescriptionPrimary Use Case
JSONStandard interchange formatBaseline, LLM generation
GLYPHKey=value compact formatLLM context, tool calls
ZONZig-inspired minimal syntaxMaximum compression
TOONYAML-like indented formatHuman readability

Size & Token Comparison

Agent Trace (50 steps)

CodecBytesvs JSONTokensvs JSONRound-trip
JSON66,103baseline15,510baseline
GLYPH57,485-13%14,656-5%
GLYPH+Pool26,167-60%8,090-48%
ZON18,367-72%5,982-61%
TOON73,739+12%18,116+17%
GLYPH+Pool provides the best balance of compression and reliability, achieving 60% size reduction while maintaining perfect round-trip safety.

Simple Object

JSON (104 bytes)
{"name":"Alice","age":28,"active":true,"score":94.5}
GLYPH (70 bytes, -33%)
{name="Alice" age=28 active=t score=94.5}
CodecBytesvs JSON
JSON104baseline
GLYPH70-33%
ZON64-38%
TOON72-31%

Nested Object

CodecBytesvs JSON
JSON320baseline
GLYPH180-44%
ZON166-48%
TOON216-33%

Tabular Data (5 employees)

CodecBytesvs JSON
JSON697baseline
GLYPH254-64%
GLYPH+Pool250-64%
ZON209-70%
TOON236-66%

Token Efficiency Analysis

For LLM applications, token count directly impacts:
  • API costs (charged per token)
  • Context window usage
  • Processing latency
  • Model accuracy (fewer tokens = clearer signal)

Token Savings Across Data Shapes

Token Reduction vs JSON:

Agent Trace (50 steps):
  ZON:        ████████████████████████████████████████ -61%
  GLYPH+Pool: ████████████████████████████████ -48%
  GLYPH:      ██ -5%
  TOON:       (overhead) +17%

Tabular Data:
  ZON:        ████████████████████████████ -70%
  GLYPH:      ████████████████████████████ -64%
  TOON:       ██████████████████████████ -66%

Nested Object:
  ZON:        ████████████████████████ -48%
  GLYPH:      ██████████████████████ -44%
  TOON:       ████████████████ -33%

Streaming Validation Performance

GLYPH supports real-time validation during LLM streaming, enabling early error detection and generation cancellation.

Early Tool Detection

TestTool Detected AtTotal TokensDetection Point
searchToken 61250%
calculateToken 61250%
browseToken 61250%
executeToken 61155%
Tool identity known halfway through response.

Early Rejection

TestUnknown ToolStopped AtTotal Would BeTokens Saved
delete_allToken 710+30%
rm_rfToken 710+30%
hack_serverToken 610+40%

Latency Savings

Stopped at token 10
Time: 139ms
Savings:
  • Tokens saved: 19/29 (66%)
  • Time saved: 67ms (33%)

Gzip Compression

When using gzip, format differences diminish significantly. All codecs compress to ~3KB for the agent trace dataset.
CodecRaw BytesGzippedCompression Ratio
JSON66,1033,36195%
GLYPH57,4853,36894%
GLYPH+Pool26,1672,89489%
ZON18,3672,84984%
TOON73,7393,68995%
Key Insight: Gzip eliminates most format differences, making raw size less important for storage but still critical for LLM context windows (which don’t use compression).

Feature Comparison

FeatureJSONGLYPHZONTOON
Size Efficiency★★☆★★★★★★★★☆☆
Token Efficiency★★☆★★★★★★★★☆☆
LLM Retrieval★★★★★★★★★★★☆★★★★
LLM Generation★★★★★★☆★☆☆★★☆
Human Readability★★★☆★★★★★★☆★★★★
Round-trip Safety★★★★★★★★★★☆★★★★
Streaming Validation★★★★★★★★N/AN/A
Tool Call Support★★★★★★★★★★☆★★★☆
Parser Availability★★★★★★★☆★★☆★★★☆

Recommendations by Use Case

  • Token budget is constrained
  • LLM reads but doesn’t generate
  • Tool calls and function arguments
  • Human-readable logs/traces needed
  • Streaming validation required
  • Context window optimization critical
  • Agent traces with repeated schemas
  • Storage efficiency is paramount
  • Output processed by tools, not LLMs
  • Maximum compression needed with safety
  • LLM needs to generate structured output
  • Maximum compatibility required
  • Interoperating with external systems
  • Parser reliability is critical
  • Maximum compression needed
  • No round-trip requirement
  • Controlled environment (known parser)
  • Experimental/research context
  • Token efficiency matters (larger than JSON)
  • Precise indentation is difficult
  • Deep nesting present (350% overhead)

Test Methodology

Environment

  • Date: December 25, 2024
  • Models: Ollama (llama3.2:3b, qwen3:8b, mistral-small:24b)
  • Tokenizer: tiktoken (cl100k_base, o200k_base)
  • Iterations: 20 per test (5 warmup)

Test Datasets

  1. simple - Flat object with 5 fields
  2. nested - 3-level deep nested object
  3. tabular - Array of 5 employee records
  4. complex - Nested arrays with departments/projects
  5. agent_trace - 50-step agent execution trace

Reproduction

# Size comparison
cd sjson/benchmark/comparison/js
node codec_substrate_bench.mjs

# LLM accuracy test
node codec_llm_accuracy_bench.mjs --model=llama3.2:3b

# Streaming validation test
node streaming_validation_test.mjs --model=llama3.2:3b

# Tool call test
node codec_toolcall_bench.mjs --model=llama3.2:3b

Corpus Results (55 Test Cases)

GLYPH-Loose vs ZON vs TOON vs JSON-minified
CodecBytescl100k tokenso200k tokens
ZON3,8781,9151,909
GLYPH4,2241,8951,873
JSON-min5,1092,1172,113
TOON5,4402,4662,445
GLYPH wins on token efficiency - the metric that matters for LLM API costs.

LLM Accuracy Report

How LLMs handle GLYPH vs other formats

Performance Report

Parser speed and optimization details

Build docs developers (and LLMs) love