Auto-Tabular Mode

Repeated keys waste tokens. Auto-tabular mode compresses arrays of objects by 50-70%.

When you have lists of similar objects (search results, database rows, logs), GLYPH automatically detects homogeneity and emits them as compact tables.

The Problem

Homogeneous arrays repeat keys for every element:

JSON - Keys repeated 3 times

[
  {"id": "doc1", "score": 0.95, "title": "GLYPH Guide"},
  {"id": "doc2", "score": 0.89, "title": "API Docs"},
  {"id": "doc3", "score": 0.84, "title": "Tutorial"}
]

120 tokens - id, score, and title appear 3 times each

With 50 results, keys appear 50 times. This is massively redundant.

The Solution

Tabular mode lists keys once, then rows:

GLYPH tabular - Keys listed once

@tab _ [id score title]
|doc1|0.95|GLYPH Guide|
|doc2|0.89|API Docs|
|doc3|0.84|Tutorial|
@end

45 tokens - Keys appear once in header

62% reduction - Savings scale linearly with row count

How It Works

Auto-Detection

GLYPH automatically detects when a list qualifies for tabular emission:

Check eligibility

Contains ≥ 3 elements (configurable)
All elements are maps or structs
Total column count ≤ 20 (configurable)
All rows have identical keys (or AllowMissing=true)

Extract schema

Collect all unique keys across all rows

Sort columns

Order columns by UTF-8 byte order (same as map keys)

Emit table

Header with columns, pipe-delimited rows, footer marker

Syntax

@tab _ [name age city]
Alice 28 NYC
Bob 32 SF
Carol 25 Austin
@end

Format Details

Header
Rows
Footer

@tab _ [col1 col2 col3]

@tab: Directive marker
_: Null placeholder (for future schema refs)
[...]: Sorted column names

Optional metadata:

@tab _ rows=50 cols=4 [col1 col2 col3 col4]

|val1|val2|val3|

Pipe-delimited cells
One row per line
Values emitted in column order
No spaces around pipes (compact)

Escaping pipes in values:

|"value with \| pipe"|

@end

Marks table completion
Required for stream parsing
Enables detection of truncated streams

Token Savings

By Row Count

From the Codec Benchmark Report:

Rows	JSON Tokens	GLYPH Loose	GLYPH Tabular	Savings vs JSON
3	120	85	45	62%
5	200	140	70	65%
10	400	270	130	68%
25	1,000	670	310	69%
50	2,000	1,340	620	69%

Savings scale linearly: More rows = more savings

Calculation

Formula: Token savings = (repeated_keys × rows) - header_overhead

Example: 50 search results, 4 fields each

JSON:
  - Per row: 4 keys × 4 tokens/key = 16 tokens
  - Total keys: 16 × 50 = 800 tokens
  - Values: ~600 tokens
  - Total: 1,400 tokens

GLYPH Tabular:
  - Header: 4 keys × 4 tokens = 16 tokens
  - Rows: ~600 tokens (values only)
  - Total: 616 tokens

Savings: 784 tokens (56%)

Usage

Enabling Tabular Mode

import glyph

data = [
    {"id": 1, "name": "Alice", "score": 95},
    {"id": 2, "name": "Bob", "score": 87},
    {"id": 3, "name": "Carol", "score": 92}
]

# Auto-detect (enabled by default in some modes)
result = glyph.json_to_glyph(data)

# Explicit tabular mode
result = glyph.canonicalize_loose_tabular(data)

# Custom options
result = glyph.canonicalize_loose_with_opts(data, {
    'auto_tabular': True,
    'min_rows': 5,     # Only tables with 5+ rows
    'max_cols': 10,    # Max 10 columns
})

Parsing Tables

import glyph

table_str = """
@tab _ [id name score]
|1|Alice|95|
|2|Bob|87|
|3|Carol|92|
@end
"""

result = glyph.parse_tabular_loose(table_str)
# result.columns: ['id', 'name', 'score']
# result.rows: [
#   {'id': 1, 'name': 'Alice', 'score': 95},
#   {'id': 2, 'name': 'Bob', 'score': 87},
#   {'id': 3, 'name': 'Carol', 'score': 92}
# ]

Column Handling

Column Ordering

Columns are sorted by UTF-8 byte order (same as map keys):

Input:  [{"score": 0.95, "id": "doc1", "title": "Guide"}]
Output:
@tab _ [id score title]  // Sorted: i < s < t
|doc1|0.95|Guide|
@end

Consistent ordering ensures deterministic output for fingerprinting.

Missing Values

When AllowMissing=true, rows with missing keys use _ (null):

Sparse Data
Tabular Output

[
  {"id": 1, "name": "Alice"},
  {"id": 2},
  {"id": 3, "name": "Carol"}
]

@tab _ [id name]
|1|Alice|
|2|_|
|3|Carol|
@end

Row 2 has missing name, filled with _

Nested Values

Complex values are emitted inline:

Input
Tabular Output

[
  {"id": 1, "meta": {"x": 10, "y": 20}},
  {"id": 2, "meta": {"x": 30, "y": 40}},
  {"id": 3, "meta": {"x": 50, "y": 60}}
]

@tab _ [id meta]
|1|{x=10 y=20}|
|2|{x=30 y=40}|
|3|{x=50 y=60}|
@end

Nested maps emitted as GLYPH within cells

Configuration Options

From the LOOSE_MODE_SPEC:

Option	Type	Default	Description
`AutoTabular`	bool	varies	Enable auto-tabular detection
`MinRows`	int	3	Minimum rows to trigger tabular
`MaxCols`	int	20	Maximum columns allowed
`AllowMissing`	bool	true	Allow rows with missing keys
`NullStyle`	enum	underscore	`symbol` for ∅, `underscore` for _
`SchemaRef`	string	""	Schema hash/id for @schema header
`KeyDict`	[]string	nil	Key dictionary for compact keys
`UseCompactKeys`	bool	false	Emit #N instead of field names

Examples

opts = {
    'auto_tabular': True,
    'allow_missing': False,  # All rows must have same keys
}
result = glyph.canonicalize_loose_with_opts(data, opts)
# Raises error if any row has different keys

Use Cases

RAG Search Results

Scenario
JSON (456 tokens)
GLYPH Tabular (220 tokens)

Retrieve 25 documents from vector database with scores and metadata.

[
  {"id": "doc1", "score": 0.95, "title": "GLYPH Guide", "url": "..."},
  {"id": "doc2", "score": 0.89, "title": "API Docs", "url": "..."},
  // ... 23 more
]

@tab _ [id score title url]
|doc1|0.95|GLYPH Guide|...|
|doc2|0.89|API Docs|...|
...
@end

52% reduction (236 tokens saved)

Database Query Results

Scenario
JSON (~2,000 tokens)
GLYPH Tabular (~800 tokens)

Export 100 user records for agent processing.

[
  {"id": 1, "name": "Alice", "email": "[email protected]", "active": true},
  {"id": 2, "name": "Bob", "email": "[email protected]", "active": false},
  // ... 98 more
]

@tab _ rows=100 cols=4 [active email id name]
|t|[email protected]|1|Alice|
|f|[email protected]|2|Bob|
...
@end

60% reduction (1,200 tokens saved)

Log Aggregation

Scenario
JSON (~1,500 tokens)
GLYPH Tabular (~600 tokens)

Send 50 log entries to LLM for analysis.

[
  {"timestamp": "2025-01-13T12:00:00Z", "level": "error", "message": "Connection timeout"},
  {"timestamp": "2025-01-13T12:00:01Z", "level": "info", "message": "Retry attempt 1"},
  // ... 48 more
]

@tab _ [timestamp level message]
|2025-01-13T12:00:00Z|error|Connection timeout|
|2025-01-13T12:00:01Z|info|Retry attempt 1|
...
@end

60% reduction (900 tokens saved)

Metadata for Streaming

Optional row/column counts enable verification:

@tab _ rows=120 cols=6 [id name score status created updated]
|1|Alice|0.95|active|2025-01-01|2025-06-15|
...
@end

Benefits

Truncation Detection

Verify all rows received: count rows and compare to metadata

Progress Tracking

Show progress: “Received 50/120 rows…”

Stream Resync

Resume after interruption by skipping to known row count

Validation

Ensure complete transmission before processing

When to Use Tabular Mode

✅ Ideal For:

Search Results

Vector database results
Web search results
Document retrieval

Typical savings: 50-60%

Database Exports

Query results
Batch data exports
CSV-like data

Typical savings: 55-65%

Log Analysis

Application logs
Access logs
Audit trails

Typical savings: 50-60%

Batch API Responses

List operations
Bulk status checks
Multi-item fetches

Typical savings: 40-55%

⚠️ Not Ideal For:

Heterogeneous Data

Objects with different structures don’t benefit from shared schema.

Single Items

One-element arrays have overhead, no savings.

Deeply Nested

Complex nesting in cells reduces readability.

Wide Schemas

20 columns make tables hard to read.

Performance

From benchmarks:

Format	Size (bytes)	Tokens	Compression
JSON-min	5,109	1,000	100% (baseline)
GLYPH-Loose	4,224	850	83%
GLYPH-Tabular	~3,900	~720	76%

Tabular mode achieves best token efficiency for homogeneous data

Summary

Token Savings

50-70% reduction for array data

Linear Scaling

More rows = proportionally more savings

Automatic

Auto-detects homogeneous lists

Human-Readable

Tabular format is easier to scan

Next Steps

Format Reference

Learn GLYPH syntax

Token Savings

Understand overall token efficiency

LOOSE MODE Spec

Complete technical specification

Benchmark Report

Full performance data

Get Started

Core Concepts

Guides

Language SDKs

Advanced

​The Problem

​The Solution

​How It Works

​Auto-Detection

​Syntax

​Format Details

​Token Savings

​By Row Count

​Calculation

​Usage

​Enabling Tabular Mode

​Parsing Tables

​Column Handling

​Column Ordering

​Missing Values

​Nested Values

​Configuration Options

​Examples

​Use Cases

​RAG Search Results

​Database Query Results

​Log Aggregation

​Metadata for Streaming

​Benefits

Truncation Detection

Progress Tracking

Stream Resync

Validation

​When to Use Tabular Mode

​✅ Ideal For:

​⚠️ Not Ideal For:

Heterogeneous Data

Single Items

Deeply Nested

Wide Schemas

​Performance

​Summary

Token Savings

Linear Scaling

Automatic

Human-Readable

​Next Steps

Format Reference

Token Savings

LOOSE MODE Spec

Benchmark Report

Build docs developers (and LLMs) love

The Problem

The Solution

How It Works

Auto-Detection

Syntax

Format Details

Token Savings

By Row Count

Calculation

Usage

Enabling Tabular Mode

Parsing Tables

Column Handling

Column Ordering

Missing Values

Nested Values

Configuration Options

Examples

Use Cases

RAG Search Results

Database Query Results

Log Aggregation

Metadata for Streaming

Benefits

When to Use Tabular Mode

✅ Ideal For:

⚠️ Not Ideal For:

Performance

Summary

Next Steps