GLYPH Performance Report
Comprehensive analysis of GLYPH parser performance, canonicalization optimization, and memory usage.Executive Summary
Speedup
7.1x fasterBase64-heavy workloads
Memory
85% reductionDeep nested structures
Compatibility
1881+ testsAll passing, zero regressions
Optimization Results
The GLYPH-Loose canonicalizer was optimized using buffer-based writers, sync.Pool for resource reuse, and stdlib base64 encoding.Key Improvements
| Metric | Achievement | Impact |
|---|---|---|
| Base64 encoding | 7.1x speedup | Stdlib assembly optimization |
| Memory allocations | 40-85% reduction | sync.Pool for slices |
| Nested structures | 1.7x speedup | Buffer-based writing |
| Backward compatibility | 100% maintained | All tests passing |
Benchmark Results
Synthetic Benchmarks
- Speed
- Memory
| Benchmark | Before (ns/op) | After (ns/op) | Speedup |
|---|---|---|---|
| Bytes_Large (1KB) | 6,130 | 862 | 7.1x |
| Nested_VeryDeep (50 levels) | 6,234 | 3,726 | 1.7x |
| Nested_Deep (10 levels) | 970 | 705 | 1.4x |
| Nested_Wide (10×20) | 12,321 | 10,007 | 1.2x |
| NoTabular_100Rows | 32,623 | 23,044 | 1.4x |
| Map_Medium (50 entries) | 3,962 | 3,741 | 1.1x |
| Map_Large (200 entries) | 20,430 | 19,240 | 1.1x |
| MixedTypes | 422 | 355 | 1.2x |
85% memory reduction on deeply nested structures (50 levels)
Realistic Workloads
| Benchmark | Before (ns/op) | After (ns/op) | Speedup | Memory Savings |
|---|---|---|---|---|
| LLMToolCall | 1,242 | 1,015 | 1.2x | 43% |
| AgentTrace | 6,011 | 5,483 | 1.1x | 39% |
| Corpus_AllCases | 34,372 | 30,103 | 1.1x | 51% |
| SchemaOpts | 13,103 | 10,422 | 1.3x | 45% |
| VectorDBResult | 11,780 | 12,180 | 1.0x | 7% |
| APIResponse | 13,674 | 15,596 | 0.9x | 8% |
LLM tool calls show 1.2x speedup and 43% memory savings - critical for agent applications.
Optimization Techniques
1. Buffer-Based Writer Pattern
Problem: String allocation overhead
Problem: String allocation overhead
Before:Every recursive call allocated an intermediate string, forcing the Go runtime to allocate, copy, and garbage collect.
Solution: Shared buffer
Solution: Shared buffer
2. sync.Pool for Resource Reuse
- String Builders
- Sortable Map Entries
- Sortable Columns
3. Stdlib Base64 Replacement
7.1x speedup on base64-heavy workloads using assembly-optimized stdlib
Memory Profile Analysis
Before Optimization
After Optimization
Allocation distribution shifted from intermediate strings to final output buffer - expected optimal behavior
Performance by Data Shape
Deep Nesting (Best Case)
Wide Maps (Good Case)
Base64 Data (Best Case)
Base64 optimization uses stdlib assembly implementation, providing massive speedup for binary data.
Scalar Values (Small Regression)
| Benchmark | Before (ns/op) | After (ns/op) | Reason |
|---|---|---|---|
| Null | 6 | 26 | Pool overhead |
| Bool | 6 | 26 | Pool overhead |
| String_Bare | 16 | 39 | Pool overhead |
- These cases are already extremely fast (< 40ns)
- Real-world data is always nested structures
- Pool overhead pays off massively in nested cases
- Overall GC pressure reduced
Cross-Codec Performance
Encoding Speed Comparison
| Codec | LLM Tool Call | API Response | Vector Search | Tabular |
|---|---|---|---|---|
| JSON | fastest | fastest | fastest | fastest |
| GLYPH | 1.2x slower | 1.3x slower | 1.2x slower | 1.1x slower |
| ZON | 2.5x slower | 3.1x slower | 2.8x slower | 2.2x slower |
| TOON | 1.8x slower | 2.0x slower | 2.1x slower | 1.9x slower |
JSON is fastest because it’s native to JavaScript/Go stdlib. GLYPH’s 1.2x overhead is acceptable for the 48% token savings.
Parsing Speed Comparison
| Codec | Simple Object | Nested Object | Tabular Data |
|---|---|---|---|
| JSON | fastest | fastest | fastest |
| GLYPH | 1.1x slower | 1.2x slower | 0.9x faster |
| ZON | 1.5x slower | 1.8x slower | 1.3x slower |
| TOON | 2.1x slower | 2.5x slower | 2.0x slower |
GLYPH parses tabular data faster than JSON due to the
@tab format avoiding repeated key parsing.Tabular Mode Performance
Trade-off: Correctness Over Speed
- Correctness requirement (proper
\|escaping) - Still faster overall than JSON for large tables
- Token savings (64%) outweigh performance cost
- Real-world data rarely has pipe characters
Tabular Auto-Detection Cost
| Operation | Cost (ns) | Impact |
|---|---|---|
| Column detection | ~500 | Per array |
| Homogeneity check | ~200 | Per array |
| Header generation | ~150 | Per table |
| Total overhead | ~850 | One-time per table |
Auto-detection overhead is amortized across all rows, making it negligible for arrays with 10+ items.
Future Optimization Opportunities
Pre-size builders
Pre-size builders
Idea: Estimate output size from input structure to avoid buffer growth.Potential gain: 10-15% speedup on large objectsComplexity: Medium (requires size estimation pass)
Custom sort for small slices
Custom sort for small slices
Idea: Avoid
reflectlite.Swapper overhead for maps with < 12 entries.Potential gain: 5-10% speedup on small mapsComplexity: Low (insertion sort for n < 12)Inline scalar formatting
Inline scalar formatting
Idea: Avoid function call overhead for int/float formatting.Potential gain: 3-5% speedup on numeric-heavy dataComplexity: Low (inline
strconv calls)Arena allocator
Arena allocator
Idea: Use arena allocator for extremely large structures (1000+ nodes).Potential gain: 20-30% speedup on massive documentsComplexity: High (requires careful lifetime management)
Running Benchmarks
Full Benchmark Suite
Memory Profiling
CPU Profiling
Compare Before/After
Test Coverage
Test Suite
1881+ test cases covering:
- Scalar canonicalization
- Container canonicalization
- Auto-tabular detection
- Cross-implementation parity
- Schema headers
- Edge cases
Results
✅ All tests passing✅ Zero regressions✅ Full backward compatibility✅ Cross-platform (Go, JS, Python)
Recommendations
When Performance Matters
Profile first
Use
go test -cpuprofile and go test -memprofile to identify bottlenecks before optimizing.Optimize hot paths
Focus on:
- Deeply nested structures
- Large maps (100+ entries)
- Base64-heavy data
- Repeated canonicalization
When to Use GLYPH
GLYPH is worth the 1.2x encoding overhead when:
- Token savings (48%) matter for LLM costs
- Context window space is limited
- Human-readable logs/traces needed
- Streaming validation required
- Storage efficiency important
When to Use JSON
Key Takeaways
Optimization Impact
- 7.1x speedup on base64 data
- 85% memory reduction on deep nesting
- 43% memory savings on LLM tool calls
- Zero regressions across 1881+ tests
Production Readiness
- Full backward compatibility
- Comprehensive benchmark suite
- Cross-platform validation
- Real-world workload testing
Related Documentation
Benchmark Results
Full codec comparison across all metrics
LLM Accuracy Report
How LLMs handle GLYPH vs other formats