Skip to main content
Repeated keys waste tokens. Auto-tabular mode compresses arrays of objects by 50-70%.
When you have lists of similar objects (search results, database rows, logs), GLYPH automatically detects homogeneity and emits them as compact tables.

The Problem

Homogeneous arrays repeat keys for every element:
JSON - Keys repeated 3 times
[
  {"id": "doc1", "score": 0.95, "title": "GLYPH Guide"},
  {"id": "doc2", "score": 0.89, "title": "API Docs"},
  {"id": "doc3", "score": 0.84, "title": "Tutorial"}
]
120 tokens - id, score, and title appear 3 times each
With 50 results, keys appear 50 times. This is massively redundant.

The Solution

Tabular mode lists keys once, then rows:
GLYPH tabular - Keys listed once
@tab _ [id score title]
|doc1|0.95|GLYPH Guide|
|doc2|0.89|API Docs|
|doc3|0.84|Tutorial|
@end
45 tokens - Keys appear once in header
62% reduction - Savings scale linearly with row count

How It Works

Auto-Detection

GLYPH automatically detects when a list qualifies for tabular emission:
1

Check eligibility

  • Contains ≥ 3 elements (configurable)
  • All elements are maps or structs
  • Total column count ≤ 20 (configurable)
  • All rows have identical keys (or AllowMissing=true)
2

Extract schema

Collect all unique keys across all rows
3

Sort columns

Order columns by UTF-8 byte order (same as map keys)
4

Emit table

Header with columns, pipe-delimited rows, footer marker

Syntax

@tab _ [name age city]
Alice 28 NYC
Bob 32 SF
Carol 25 Austin
@end

Format Details

Token Savings

By Row Count

From the Codec Benchmark Report:
RowsJSON TokensGLYPH LooseGLYPH TabularSavings vs JSON
3120854562%
52001407065%
1040027013068%
251,00067031069%
502,0001,34062069%
Savings scale linearly: More rows = more savings

Calculation

Formula: Token savings = (repeated_keys × rows) - header_overhead
Example: 50 search results, 4 fields each

JSON:
  - Per row: 4 keys × 4 tokens/key = 16 tokens
  - Total keys: 16 × 50 = 800 tokens
  - Values: ~600 tokens
  - Total: 1,400 tokens

GLYPH Tabular:
  - Header: 4 keys × 4 tokens = 16 tokens
  - Rows: ~600 tokens (values only)
  - Total: 616 tokens

Savings: 784 tokens (56%)

Usage

Enabling Tabular Mode

import glyph

data = [
    {"id": 1, "name": "Alice", "score": 95},
    {"id": 2, "name": "Bob", "score": 87},
    {"id": 3, "name": "Carol", "score": 92}
]

# Auto-detect (enabled by default in some modes)
result = glyph.json_to_glyph(data)

# Explicit tabular mode
result = glyph.canonicalize_loose_tabular(data)

# Custom options
result = glyph.canonicalize_loose_with_opts(data, {
    'auto_tabular': True,
    'min_rows': 5,     # Only tables with 5+ rows
    'max_cols': 10,    # Max 10 columns
})

Parsing Tables

import glyph

table_str = """
@tab _ [id name score]
|1|Alice|95|
|2|Bob|87|
|3|Carol|92|
@end
"""

result = glyph.parse_tabular_loose(table_str)
# result.columns: ['id', 'name', 'score']
# result.rows: [
#   {'id': 1, 'name': 'Alice', 'score': 95},
#   {'id': 2, 'name': 'Bob', 'score': 87},
#   {'id': 3, 'name': 'Carol', 'score': 92}
# ]

Column Handling

Column Ordering

Columns are sorted by UTF-8 byte order (same as map keys):
Input:  [{"score": 0.95, "id": "doc1", "title": "Guide"}]
Output:
@tab _ [id score title]  // Sorted: i < s < t
|doc1|0.95|Guide|
@end
Consistent ordering ensures deterministic output for fingerprinting.

Missing Values

When AllowMissing=true, rows with missing keys use _ (null):
[
  {"id": 1, "name": "Alice"},
  {"id": 2},
  {"id": 3, "name": "Carol"}
]

Nested Values

Complex values are emitted inline:
[
  {"id": 1, "meta": {"x": 10, "y": 20}},
  {"id": 2, "meta": {"x": 30, "y": 40}},
  {"id": 3, "meta": {"x": 50, "y": 60}}
]

Configuration Options

From the LOOSE_MODE_SPEC:
OptionTypeDefaultDescription
AutoTabularboolvariesEnable auto-tabular detection
MinRowsint3Minimum rows to trigger tabular
MaxColsint20Maximum columns allowed
AllowMissingbooltrueAllow rows with missing keys
NullStyleenumunderscoresymbol for ∅, underscore for _
SchemaRefstring""Schema hash/id for @schema header
KeyDict[]stringnilKey dictionary for compact keys
UseCompactKeysboolfalseEmit #N instead of field names

Examples

opts = {
    'auto_tabular': True,
    'allow_missing': False,  # All rows must have same keys
}
result = glyph.canonicalize_loose_with_opts(data, opts)
# Raises error if any row has different keys

Use Cases

RAG Search Results

Retrieve 25 documents from vector database with scores and metadata.

Database Query Results

Export 100 user records for agent processing.

Log Aggregation

Send 50 log entries to LLM for analysis.

Metadata for Streaming

Optional row/column counts enable verification:
@tab _ rows=120 cols=6 [id name score status created updated]
|1|Alice|0.95|active|2025-01-01|2025-06-15|
...
@end

Benefits

Truncation Detection

Verify all rows received: count rows and compare to metadata

Progress Tracking

Show progress: “Received 50/120 rows…”

Stream Resync

Resume after interruption by skipping to known row count

Validation

Ensure complete transmission before processing

When to Use Tabular Mode

✅ Ideal For:

  • Vector database results
  • Web search results
  • Document retrieval
Typical savings: 50-60%
  • Query results
  • Batch data exports
  • CSV-like data
Typical savings: 55-65%
  • Application logs
  • Access logs
  • Audit trails
Typical savings: 50-60%
  • List operations
  • Bulk status checks
  • Multi-item fetches
Typical savings: 40-55%

⚠️ Not Ideal For:

Heterogeneous Data

Objects with different structures don’t benefit from shared schema.

Single Items

One-element arrays have overhead, no savings.

Deeply Nested

Complex nesting in cells reduces readability.

Wide Schemas

20 columns make tables hard to read.

Performance

From benchmarks:
FormatSize (bytes)TokensCompression
JSON-min5,1091,000100% (baseline)
GLYPH-Loose4,22485083%
GLYPH-Tabular~3,900~72076%
Tabular mode achieves best token efficiency for homogeneous data

Summary

Token Savings

50-70% reduction for array data

Linear Scaling

More rows = proportionally more savings

Automatic

Auto-detects homogeneous lists

Human-Readable

Tabular format is easier to scan

Next Steps

Format Reference

Learn GLYPH syntax

Token Savings

Understand overall token efficiency

LOOSE MODE Spec

Complete technical specification

Benchmark Report

Full performance data

Build docs developers (and LLMs) love