GLYPH Performance Report

Comprehensive analysis of GLYPH parser performance, canonicalization optimization, and memory usage.

Executive Summary

Speedup

7.1x fasterBase64-heavy workloads

Memory

85% reductionDeep nested structures

Compatibility

1881+ testsAll passing, zero regressions

Optimization Results

The GLYPH-Loose canonicalizer was optimized using buffer-based writers, sync.Pool for resource reuse, and stdlib base64 encoding.

Key Improvements

Metric	Achievement	Impact
Base64 encoding	7.1x speedup	Stdlib assembly optimization
Memory allocations	40-85% reduction	sync.Pool for slices
Nested structures	1.7x speedup	Buffer-based writing
Backward compatibility	100% maintained	All tests passing

Benchmark Results

Synthetic Benchmarks

Speed
Memory

Benchmark	Before (ns/op)	After (ns/op)	Speedup
Bytes_Large (1KB)	6,130	862	7.1x
Nested_VeryDeep (50 levels)	6,234	3,726	1.7x
Nested_Deep (10 levels)	970	705	1.4x
Nested_Wide (10×20)	12,321	10,007	1.2x
NoTabular_100Rows	32,623	23,044	1.4x
Map_Medium (50 entries)	3,962	3,741	1.1x
Map_Large (200 entries)	20,430	19,240	1.1x
MixedTypes	422	355	1.2x

Benchmark	Before (B/op)	After (B/op)	Savings
Bytes_Large (1KB)	5,632	4,235	25%
Nested_VeryDeep (50 levels)	14,952	2,221	85%
Nested_Wide (10×20)	16,208	4,967	69%
Nested_Deep (10 levels)	1,352	488	64%
Map_Medium (50 entries)	3,184	1,137	64%
Map_Large (200 entries)	14,008	5,824	58%
NoTabular_100Rows	46,440	24,572	47%
MixedTypes	240	144	40%

85% memory reduction on deeply nested structures (50 levels)

Realistic Workloads

Benchmark	Before (ns/op)	After (ns/op)	Speedup	Memory Savings
LLMToolCall	1,242	1,015	1.2x	43%
AgentTrace	6,011	5,483	1.1x	39%
Corpus_AllCases	34,372	30,103	1.1x	51%
SchemaOpts	13,103	10,422	1.3x	45%
VectorDBResult	11,780	12,180	1.0x	7%
APIResponse	13,674	15,596	0.9x	8%

LLM tool calls show 1.2x speedup and 43% memory savings - critical for agent applications.

Optimization Techniques

1. Buffer-Based Writer Pattern

Problem: String allocation overhead

Before:

func canonLooseWithOpts(v *GValue, opts LooseCanonOpts) string {
    switch v.typ {
    case TypeList:
        return canonListLooseWithOpts(v.listVal, opts)  // Allocates string
    case TypeMap:
        return canonMapLooseWithOpts(v.mapVal, opts)    // Allocates string
    }
}

Every recursive call allocated an intermediate string, forcing the Go runtime to allocate, copy, and garbage collect.

Solution: Shared buffer

After:

func canonLooseWithOpts(v *GValue, opts LooseCanonOpts) string {
    b := getPooledBuilder()
    writeCanonLoose(b, v, opts)  // Writes to shared buffer
    result := b.String()
    putPooledBuilder(b)
    return result
}

func writeCanonLoose(b *strings.Builder, v *GValue, opts LooseCanonOpts) {
    switch v.typ {
    case TypeList:
        writeListLoose(b, v.listVal, opts)  // No intermediate allocation
    case TypeMap:
        writeMapLoose(b, v.mapVal, opts)    // No intermediate allocation
    }
}

Writes directly to a single buffer, eliminating intermediate allocations.

2. sync.Pool for Resource Reuse

String Builders
Sortable Map Entries
Sortable Columns

var stringBuilderPool = sync.Pool{
    New: func() interface{} { return &strings.Builder{} },
}

func getPooledBuilder() *strings.Builder {
    b := stringBuilderPool.Get().(*strings.Builder)
    b.Reset()
    return b
}

func putPooledBuilder(b *strings.Builder) {
    stringBuilderPool.Put(b)
}

Benefit: Reuses builders across canonicalization calls, reducing GC pressure.

var sortableMapEntryPool = sync.Pool{
    New: func() interface{} {
        slice := make([]sortableMapEntry, 0, 32)
        return &slice
    },
}

Benefit: Map key sorting no longer allocates new slices per map.

var sortableColPool = sync.Pool{
    New: func() interface{} {
        slice := make([]sortableCol, 0, 32)
        return &slice
    },
}

Benefit: Tabular mode column sorting reuses slices.

3. Stdlib Base64 Replacement

func base64Encode(data []byte) string {
    const encodeStd = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    result := make([]byte, ((len(data)+2)/3)*4)
    // Manual encoding loop...
    // 40 lines of bit manipulation
}

7.1x speedup on base64-heavy workloads using assembly-optimized stdlib

Memory Profile Analysis

Before Optimization

Function                        Memory    % of Total
─────────────────────────────────────────────────────
strings.Builder.WriteString     451 MB    40.0%
canonMapLooseWithOpts           209 MB    18.6%
reflectlite.Swapper             188 MB    16.7%
detectTabular                   102 MB     9.1%
quoteString (Builder.Grow)       89 MB     7.9%
strconv.FormatInt                28 MB     2.5%

40% of allocations from intermediate string building in recursive calls

After Optimization

Function                        Memory    % of Total
─────────────────────────────────────────────────────
strings.Builder.WriteString     489 MB    42.1%  (expected - final output)
reflectlite.Swapper             183 MB    15.8%  (unchanged - sort overhead)
strings.Builder.WriteByte       202 MB    17.4%  (small writes)
strings.Builder.WriteRune        96 MB     8.3%  (unicode handling)
getObjectKeys                    92 MB     8.0%  (tabular detection)
strconv.formatBits               30 MB     2.6%  (unchanged)

Allocation distribution shifted from intermediate strings to final output buffer - expected optimal behavior

Performance by Data Shape

Deep Nesting (Best Case)

Before: 6,234 ns/op, 14,952 B/op
After:  3,726 ns/op,  2,221 B/op

Speedup: 1.7x
Memory Savings: 85%

Wide Maps (Good Case)

200-entry map
Before: 20,430 ns/op, 14,008 B/op
After:  19,240 ns/op,  5,824 B/op

Speedup: 1.1x
Memory Savings: 58%

Base64 Data (Best Case)

1KB binary data (base64 encoded)
Before: 6,130 ns/op, 5,632 B/op
After:    862 ns/op, 4,235 B/op

Speedup: 7.1x
Memory Savings: 25%

Base64 optimization uses stdlib assembly implementation, providing massive speedup for binary data.

Scalar Values (Small Regression)

Trade-off: Slight overhead for scalar-only cases due to pool get/put operations.

Benchmark	Before (ns/op)	After (ns/op)	Reason
Null	6	26	Pool overhead
Bool	6	26	Pool overhead
String_Bare	16	39	Pool overhead

Mitigation: This is acceptable because:

These cases are already extremely fast (< 40ns)
Real-world data is always nested structures
Pool overhead pays off massively in nested cases
Overall GC pressure reduced

Cross-Codec Performance

Encoding Speed Comparison

Codec	LLM Tool Call	API Response	Vector Search	Tabular
JSON	fastest	fastest	fastest	fastest
GLYPH	1.2x slower	1.3x slower	1.2x slower	1.1x slower
ZON	2.5x slower	3.1x slower	2.8x slower	2.2x slower
TOON	1.8x slower	2.0x slower	2.1x slower	1.9x slower

JSON is fastest because it’s native to JavaScript/Go stdlib. GLYPH’s 1.2x overhead is acceptable for the 48% token savings.

Parsing Speed Comparison

Codec	Simple Object	Nested Object	Tabular Data
JSON	fastest	fastest	fastest
GLYPH	1.1x slower	1.2x slower	0.9x faster
ZON	1.5x slower	1.8x slower	1.3x slower
TOON	2.1x slower	2.5x slower	2.0x slower

GLYPH parses tabular data faster than JSON due to the @tab format avoiding repeated key parsing.

Tabular Mode Performance

Trade-off: Correctness Over Speed

100-row tabular dataset
Before: 205 allocs
After:  415 allocs

Reason: Each cell value written to temp builder for escaping

Tabular mode shows increased allocations because each cell is escaped individually to handle pipe characters (|).

Why this is acceptable:

Correctness requirement (proper \| escaping)
Still faster overall than JSON for large tables
Token savings (64%) outweigh performance cost
Real-world data rarely has pipe characters

Tabular Auto-Detection Cost

Operation	Cost (ns)	Impact
Column detection	~500	Per array
Homogeneity check	~200	Per array
Header generation	~150	Per table
Total overhead	~850	One-time per table

Auto-detection overhead is amortized across all rows, making it negligible for arrays with 10+ items.

Future Optimization Opportunities

Pre-size builders

Idea: Estimate output size from input structure to avoid buffer growth.Potential gain: 10-15% speedup on large objectsComplexity: Medium (requires size estimation pass)

Custom sort for small slices

Idea: Avoid reflectlite.Swapper overhead for maps with < 12 entries.Potential gain: 5-10% speedup on small mapsComplexity: Low (insertion sort for n < 12)

Inline scalar formatting

Idea: Avoid function call overhead for int/float formatting.Potential gain: 3-5% speedup on numeric-heavy dataComplexity: Low (inline strconv calls)

Arena allocator

Idea: Use arena allocator for extremely large structures (1000+ nodes).Potential gain: 20-30% speedup on massive documentsComplexity: High (requires careful lifetime management)

Running Benchmarks

Full Benchmark Suite

# Run all benchmarks with memory stats
go test -bench=BenchmarkCanonicalizeLoose -benchmem -count=3 ./sjson/glyph/

Memory Profiling

# Generate memory profile
go test -bench=BenchmarkCanonicalizeLoose_Allocs_Large -memprofile=mem.out ./sjson/glyph/
go tool pprof -top mem.out

# Interactive analysis
go tool pprof -http=:8080 mem.out

CPU Profiling

# Generate CPU profile
go test -bench=BenchmarkCanonicalizeLoose -cpuprofile=cpu.out ./sjson/glyph/
go tool pprof -top cpu.out

# Flame graph
go tool pprof -http=:8080 cpu.out

Compare Before/After

# Requires benchstat
go install golang.org/x/perf/cmd/benchstat@latest

# Save baseline
go test -bench=. -benchmem -count=10 > baseline.txt

# Make changes, then compare
go test -bench=. -benchmem -count=10 > optimized.txt
benchstat baseline.txt optimized.txt

Test Coverage

Test Suite

1881+ test cases covering:

Scalar canonicalization
Container canonicalization
Auto-tabular detection
Cross-implementation parity
Schema headers
Edge cases

Results

✅ All tests passing✅ Zero regressions✅ Full backward compatibility✅ Cross-platform (Go, JS, Python)

# Run full test suite
go test ./sjson/glyph/...
# ok      agentscope/sjson/glyph          10.025s
# ok      agentscope/sjson/glyph/stream   0.003s

Recommendations

When Performance Matters

Profile first

Use go test -cpuprofile and go test -memprofile to identify bottlenecks before optimizing.

Optimize hot paths

Focus on:

Deeply nested structures
Large maps (100+ entries)
Base64-heavy data
Repeated canonicalization

Batch operations

Canonicalize multiple objects in the same goroutine to benefit from pool reuse.

Monitor memory

Watch for allocation patterns that indicate pool exhaustion or GC thrashing.

When to Use GLYPH

GLYPH is worth the 1.2x encoding overhead when:

Token savings (48%) matter for LLM costs
Context window space is limited
Human-readable logs/traces needed
Streaming validation required
Storage efficiency important

When to Use JSON

Stick with JSON when:

Pure encoding speed is critical (real-time systems)
LLM needs to generate output (100% reliability)
External system compatibility required
Zero optimization budget available

Key Takeaways

Optimization Impact

7.1x speedup on base64 data
85% memory reduction on deep nesting
43% memory savings on LLM tool calls
Zero regressions across 1881+ tests

Production Readiness

Full backward compatibility
Comprehensive benchmark suite
Cross-platform validation
Real-world workload testing

Benchmark Results

Full codec comparison across all metrics

LLM Accuracy Report

How LLMs handle GLYPH vs other formats

Research

Specifications

Examples

​GLYPH Performance Report

​Executive Summary

Speedup

Memory

Compatibility

​Optimization Results

​Key Improvements

​Benchmark Results

​Synthetic Benchmarks

​Realistic Workloads

​Optimization Techniques

​1. Buffer-Based Writer Pattern

​2. sync.Pool for Resource Reuse

​3. Stdlib Base64 Replacement

​Memory Profile Analysis

​Before Optimization

​After Optimization

​Performance by Data Shape

​Deep Nesting (Best Case)

​Wide Maps (Good Case)

​Base64 Data (Best Case)

​Scalar Values (Small Regression)

​Cross-Codec Performance

​Encoding Speed Comparison

​Parsing Speed Comparison

​Tabular Mode Performance

​Trade-off: Correctness Over Speed

​Tabular Auto-Detection Cost

​Future Optimization Opportunities

​Running Benchmarks

​Full Benchmark Suite

​Memory Profiling

​CPU Profiling

​Compare Before/After

​Test Coverage

Test Suite

Results

​Recommendations

​When Performance Matters

​When to Use GLYPH

​When to Use JSON

​Key Takeaways

Optimization Impact

Production Readiness

​Related Documentation

Benchmark Results

LLM Accuracy Report

Build docs developers (and LLMs) love

GLYPH Performance Report

Executive Summary

Optimization Results

Key Improvements

Benchmark Results

Synthetic Benchmarks

Realistic Workloads

Optimization Techniques

1. Buffer-Based Writer Pattern

2. sync.Pool for Resource Reuse

3. Stdlib Base64 Replacement

Memory Profile Analysis

Before Optimization

After Optimization

Performance by Data Shape

Deep Nesting (Best Case)

Wide Maps (Good Case)

Base64 Data (Best Case)

Scalar Values (Small Regression)

Cross-Codec Performance

Encoding Speed Comparison

Parsing Speed Comparison

Tabular Mode Performance

Trade-off: Correctness Over Speed

Tabular Auto-Detection Cost

Future Optimization Opportunities

Running Benchmarks

Full Benchmark Suite

Memory Profiling

CPU Profiling

Compare Before/After

Test Coverage

Recommendations

When Performance Matters

When to Use GLYPH

When to Use JSON

Key Takeaways

Related Documentation