Skip to main content

GLYPH Performance Report

Comprehensive analysis of GLYPH parser performance, canonicalization optimization, and memory usage.

Executive Summary

Speedup

7.1x fasterBase64-heavy workloads

Memory

85% reductionDeep nested structures

Compatibility

1881+ testsAll passing, zero regressions

Optimization Results

The GLYPH-Loose canonicalizer was optimized using buffer-based writers, sync.Pool for resource reuse, and stdlib base64 encoding.

Key Improvements

MetricAchievementImpact
Base64 encoding7.1x speedupStdlib assembly optimization
Memory allocations40-85% reductionsync.Pool for slices
Nested structures1.7x speedupBuffer-based writing
Backward compatibility100% maintainedAll tests passing

Benchmark Results

Synthetic Benchmarks

BenchmarkBefore (ns/op)After (ns/op)Speedup
Bytes_Large (1KB)6,1308627.1x
Nested_VeryDeep (50 levels)6,2343,7261.7x
Nested_Deep (10 levels)9707051.4x
Nested_Wide (10×20)12,32110,0071.2x
NoTabular_100Rows32,62323,0441.4x
Map_Medium (50 entries)3,9623,7411.1x
Map_Large (200 entries)20,43019,2401.1x
MixedTypes4223551.2x
85% memory reduction on deeply nested structures (50 levels)

Realistic Workloads

BenchmarkBefore (ns/op)After (ns/op)SpeedupMemory Savings
LLMToolCall1,2421,0151.2x43%
AgentTrace6,0115,4831.1x39%
Corpus_AllCases34,37230,1031.1x51%
SchemaOpts13,10310,4221.3x45%
VectorDBResult11,78012,1801.0x7%
APIResponse13,67415,5960.9x8%
LLM tool calls show 1.2x speedup and 43% memory savings - critical for agent applications.

Optimization Techniques

1. Buffer-Based Writer Pattern

Before:
func canonLooseWithOpts(v *GValue, opts LooseCanonOpts) string {
    switch v.typ {
    case TypeList:
        return canonListLooseWithOpts(v.listVal, opts)  // Allocates string
    case TypeMap:
        return canonMapLooseWithOpts(v.mapVal, opts)    // Allocates string
    }
}
Every recursive call allocated an intermediate string, forcing the Go runtime to allocate, copy, and garbage collect.
After:
func canonLooseWithOpts(v *GValue, opts LooseCanonOpts) string {
    b := getPooledBuilder()
    writeCanonLoose(b, v, opts)  // Writes to shared buffer
    result := b.String()
    putPooledBuilder(b)
    return result
}

func writeCanonLoose(b *strings.Builder, v *GValue, opts LooseCanonOpts) {
    switch v.typ {
    case TypeList:
        writeListLoose(b, v.listVal, opts)  // No intermediate allocation
    case TypeMap:
        writeMapLoose(b, v.mapVal, opts)    // No intermediate allocation
    }
}
Writes directly to a single buffer, eliminating intermediate allocations.

2. sync.Pool for Resource Reuse

var stringBuilderPool = sync.Pool{
    New: func() interface{} { return &strings.Builder{} },
}

func getPooledBuilder() *strings.Builder {
    b := stringBuilderPool.Get().(*strings.Builder)
    b.Reset()
    return b
}

func putPooledBuilder(b *strings.Builder) {
    stringBuilderPool.Put(b)
}
Benefit: Reuses builders across canonicalization calls, reducing GC pressure.

3. Stdlib Base64 Replacement

func base64Encode(data []byte) string {
    const encodeStd = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    result := make([]byte, ((len(data)+2)/3)*4)
    // Manual encoding loop...
    // 40 lines of bit manipulation
}
7.1x speedup on base64-heavy workloads using assembly-optimized stdlib

Memory Profile Analysis

Before Optimization

Function                        Memory    % of Total
─────────────────────────────────────────────────────
strings.Builder.WriteString     451 MB    40.0%
canonMapLooseWithOpts           209 MB    18.6%
reflectlite.Swapper             188 MB    16.7%
detectTabular                   102 MB     9.1%
quoteString (Builder.Grow)       89 MB     7.9%
strconv.FormatInt                28 MB     2.5%
40% of allocations from intermediate string building in recursive calls

After Optimization

Function                        Memory    % of Total
─────────────────────────────────────────────────────
strings.Builder.WriteString     489 MB    42.1%  (expected - final output)
reflectlite.Swapper             183 MB    15.8%  (unchanged - sort overhead)
strings.Builder.WriteByte       202 MB    17.4%  (small writes)
strings.Builder.WriteRune        96 MB     8.3%  (unicode handling)
getObjectKeys                    92 MB     8.0%  (tabular detection)
strconv.formatBits               30 MB     2.6%  (unchanged)
Allocation distribution shifted from intermediate strings to final output buffer - expected optimal behavior

Performance by Data Shape

Deep Nesting (Best Case)

Before: 6,234 ns/op, 14,952 B/op
After:  3,726 ns/op,  2,221 B/op

Speedup: 1.7x
Memory Savings: 85%

Wide Maps (Good Case)

200-entry map
Before: 20,430 ns/op, 14,008 B/op
After:  19,240 ns/op,  5,824 B/op

Speedup: 1.1x
Memory Savings: 58%

Base64 Data (Best Case)

1KB binary data (base64 encoded)
Before: 6,130 ns/op, 5,632 B/op
After:    862 ns/op, 4,235 B/op

Speedup: 7.1x
Memory Savings: 25%
Base64 optimization uses stdlib assembly implementation, providing massive speedup for binary data.

Scalar Values (Small Regression)

Trade-off: Slight overhead for scalar-only cases due to pool get/put operations.
BenchmarkBefore (ns/op)After (ns/op)Reason
Null626Pool overhead
Bool626Pool overhead
String_Bare1639Pool overhead
Mitigation: This is acceptable because:
  1. These cases are already extremely fast (< 40ns)
  2. Real-world data is always nested structures
  3. Pool overhead pays off massively in nested cases
  4. Overall GC pressure reduced

Cross-Codec Performance

Encoding Speed Comparison

CodecLLM Tool CallAPI ResponseVector SearchTabular
JSONfastestfastestfastestfastest
GLYPH1.2x slower1.3x slower1.2x slower1.1x slower
ZON2.5x slower3.1x slower2.8x slower2.2x slower
TOON1.8x slower2.0x slower2.1x slower1.9x slower
JSON is fastest because it’s native to JavaScript/Go stdlib. GLYPH’s 1.2x overhead is acceptable for the 48% token savings.

Parsing Speed Comparison

CodecSimple ObjectNested ObjectTabular Data
JSONfastestfastestfastest
GLYPH1.1x slower1.2x slower0.9x faster
ZON1.5x slower1.8x slower1.3x slower
TOON2.1x slower2.5x slower2.0x slower
GLYPH parses tabular data faster than JSON due to the @tab format avoiding repeated key parsing.

Tabular Mode Performance

Trade-off: Correctness Over Speed

100-row tabular dataset
Before: 205 allocs
After:  415 allocs

Reason: Each cell value written to temp builder for escaping
Tabular mode shows increased allocations because each cell is escaped individually to handle pipe characters (|).
Why this is acceptable:
  1. Correctness requirement (proper \| escaping)
  2. Still faster overall than JSON for large tables
  3. Token savings (64%) outweigh performance cost
  4. Real-world data rarely has pipe characters

Tabular Auto-Detection Cost

OperationCost (ns)Impact
Column detection~500Per array
Homogeneity check~200Per array
Header generation~150Per table
Total overhead~850One-time per table
Auto-detection overhead is amortized across all rows, making it negligible for arrays with 10+ items.

Future Optimization Opportunities

Idea: Estimate output size from input structure to avoid buffer growth.Potential gain: 10-15% speedup on large objectsComplexity: Medium (requires size estimation pass)
Idea: Avoid reflectlite.Swapper overhead for maps with < 12 entries.Potential gain: 5-10% speedup on small mapsComplexity: Low (insertion sort for n < 12)
Idea: Avoid function call overhead for int/float formatting.Potential gain: 3-5% speedup on numeric-heavy dataComplexity: Low (inline strconv calls)
Idea: Use arena allocator for extremely large structures (1000+ nodes).Potential gain: 20-30% speedup on massive documentsComplexity: High (requires careful lifetime management)

Running Benchmarks

Full Benchmark Suite

# Run all benchmarks with memory stats
go test -bench=BenchmarkCanonicalizeLoose -benchmem -count=3 ./sjson/glyph/

Memory Profiling

# Generate memory profile
go test -bench=BenchmarkCanonicalizeLoose_Allocs_Large -memprofile=mem.out ./sjson/glyph/
go tool pprof -top mem.out

# Interactive analysis
go tool pprof -http=:8080 mem.out

CPU Profiling

# Generate CPU profile
go test -bench=BenchmarkCanonicalizeLoose -cpuprofile=cpu.out ./sjson/glyph/
go tool pprof -top cpu.out

# Flame graph
go tool pprof -http=:8080 cpu.out

Compare Before/After

# Requires benchstat
go install golang.org/x/perf/cmd/benchstat@latest

# Save baseline
go test -bench=. -benchmem -count=10 > baseline.txt

# Make changes, then compare
go test -bench=. -benchmem -count=10 > optimized.txt
benchstat baseline.txt optimized.txt

Test Coverage

Test Suite

1881+ test cases covering:
  • Scalar canonicalization
  • Container canonicalization
  • Auto-tabular detection
  • Cross-implementation parity
  • Schema headers
  • Edge cases

Results

✅ All tests passing✅ Zero regressions✅ Full backward compatibility✅ Cross-platform (Go, JS, Python)
# Run full test suite
go test ./sjson/glyph/...
# ok      agentscope/sjson/glyph          10.025s
# ok      agentscope/sjson/glyph/stream   0.003s

Recommendations

When Performance Matters

1

Profile first

Use go test -cpuprofile and go test -memprofile to identify bottlenecks before optimizing.
2

Optimize hot paths

Focus on:
  • Deeply nested structures
  • Large maps (100+ entries)
  • Base64-heavy data
  • Repeated canonicalization
3

Batch operations

Canonicalize multiple objects in the same goroutine to benefit from pool reuse.
4

Monitor memory

Watch for allocation patterns that indicate pool exhaustion or GC thrashing.

When to Use GLYPH

GLYPH is worth the 1.2x encoding overhead when:
  • Token savings (48%) matter for LLM costs
  • Context window space is limited
  • Human-readable logs/traces needed
  • Streaming validation required
  • Storage efficiency important

When to Use JSON

Stick with JSON when:
  • Pure encoding speed is critical (real-time systems)
  • LLM needs to generate output (100% reliability)
  • External system compatibility required
  • Zero optimization budget available

Key Takeaways

Optimization Impact

  • 7.1x speedup on base64 data
  • 85% memory reduction on deep nesting
  • 43% memory savings on LLM tool calls
  • Zero regressions across 1881+ tests

Production Readiness

  • Full backward compatibility
  • Comprehensive benchmark suite
  • Cross-platform validation
  • Real-world workload testing

Benchmark Results

Full codec comparison across all metrics

LLM Accuracy Report

How LLMs handle GLYPH vs other formats

Build docs developers (and LLMs) love