Overview
Fingerprinting is the process of computing a deterministic SHA-256 hash of a GLYPH value’s canonical form. This hash serves as a cryptographic fingerprint of the state.
What is a Fingerprint?
fingerprint = sha256( canonicalize(value) )
Properties:
Deterministic : Same data → same hash (across all languages)
Collision-resistant : Different data → different hash (with overwhelming probability)
Compact : 64 hex characters (256 bits)
Cross-language : Go, Python, JS, Rust produce identical hashes
Why Fingerprinting?
State Verification Detect state divergence and corruption
Optimistic Concurrency Prevent lost updates in distributed systems
Cache Keys Stable keys for LLM response caching
Deduplication Identify duplicate documents or messages
Computing Fingerprints
Basic Usage
import glyph
data = { "user" : "alice" , "count" : 42 }
hash = glyph.fingerprint_loose(data)
print ( hash )
# sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890
# Short form (first 16 chars)
short_hash = hash [ 7 : 23 ] # Skip 'sha256:' prefix
print (short_hash)
# a1b2c3d4e5f67890
How It Works
Canonicalize the value using Loose mode rules:
Sort map keys bytewise
Use deterministic float formatting
Apply bare-string rules
Normalize whitespace
Hash the canonical UTF-8 bytes using SHA-256
Format as sha256:<64 hex chars>
Example:
Input: {"b":1,"a":2}
Canonical: {a=2 b=1}
SHA-256: a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890
Result: sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890
Use Cases
State Verification
Detect when state has changed unexpectedly.
import glyph
# Save checkpoint
state = { "count" : 5 , "status" : "active" }
checkpoint = {
"state" : state,
"hash" : glyph.fingerprint_loose(state),
"timestamp" : now(),
}
save_to_disk(checkpoint)
# Load and verify
loaded = load_from_disk()
expected_hash = loaded[ "hash" ]
actual_hash = glyph.fingerprint_loose(loaded[ "state" ])
if actual_hash == expected_hash:
print ( "✓ Checkpoint integrity verified" )
resume(loaded[ "state" ])
else :
print ( "✗ Checkpoint corrupted!" )
raise CorruptionError()
Optimistic Concurrency Control
Prevent lost updates when multiple agents modify shared state.
import glyph
from glyph import stream
# Agent A reads state
state = { "count" : 5 }
base_hash = glyph.fingerprint_loose(state)
# base_hash: sha256:abc123...
# Agent A creates update
patch = glyph.patch([( "~" , "count" , 1 )])
# Agent A sends patch with base hash
writer.write_frame(
sid = 1 ,
seq = 5 ,
kind = "patch" ,
payload = patch,
base = base_hash[: 16 ], # First 16 chars: abc123...
)
# Server receives patch
@handler.on_patch
def handle_patch ( sid , seq , payload , state , base ):
# Verify base hash
current_hash = glyph.fingerprint_loose(state.value)
if not current_hash.startswith(base):
# State changed since Agent A read it
raise BaseMismatchError( "State diverged" )
# Safe to apply
patch = glyph.parse_patch(payload)
new_state = glyph.apply_patch(state.value, patch)
return new_state
Why this prevents lost updates:
Scenario: Two agents update same state concurrently
1. Agent A reads: {count=5} hash=abc123...
2. Agent B reads: {count=5} hash=abc123...
3. Agent A sends: base=abc123... ✓ Applied → {count=6} hash=def456...
4. Agent B sends: base=abc123... ✗ Rejected (base != def456...)
5. Agent B retries: reads {count=6} hash=def456...
6. Agent B sends: base=def456... ✓ Applied → {count=7}
Cache Keys for LLM Responses
Generate stable cache keys for LLM prompts and responses.
import glyph
import redis
redis_client = redis.Redis()
def cached_llm_call ( prompt : dict , model : str ) -> str :
"""Call LLM with caching."""
# Create cache key from prompt fingerprint
cache_key_data = {
"model" : model,
"prompt" : prompt,
"version" : "v2" ,
}
cache_key = glyph.fingerprint_loose(cache_key_data)
# Check cache
cached = redis_client.get(cache_key)
if cached:
print ( "Cache hit!" )
return cached.decode()
# Call LLM
print ( "Cache miss, calling LLM..." )
response = llm.generate(prompt)
# Store in cache (24h TTL)
redis_client.setex(cache_key, 86400 , response)
return response
# Usage
prompt = {
"system" : "You are a helpful assistant." ,
"user" : "What is the capital of France?" ,
}
response = cached_llm_call(prompt, model = "gpt-4" )
# First call: Cache miss, calling LLM...
response = cached_llm_call(prompt, model = "gpt-4" )
# Second call: Cache hit!
Document Deduplication
Identify duplicate documents in a corpus.
import glyph
from collections import defaultdict
def find_duplicates ( documents : list[ dict ]) -> dict[ str , list[ int ]]:
"""Find duplicate documents by fingerprint."""
fingerprints = defaultdict( list )
for i, doc in enumerate (documents):
fp = glyph.fingerprint_loose(doc)
fingerprints[fp].append(i)
# Return only duplicates (fingerprints with 2+ docs)
duplicates = {fp: indices for fp, indices in fingerprints.items() if len (indices) > 1 }
return duplicates
# Usage
docs = [
{ "title" : "Intro" , "content" : "Hello world" },
{ "title" : "Guide" , "content" : "How to use" },
{ "title" : "Intro" , "content" : "Hello world" }, # Duplicate of doc 0
{ "title" : "FAQ" , "content" : "Questions" },
]
dupes = find_duplicates(docs)
print ( f "Found { len (dupes) } sets of duplicates" )
for fp, indices in dupes.items():
print ( f " Documents { indices } are identical (hash: { fp[: 16 ] } ...)" )
# Output:
# Found 1 sets of duplicates
# Documents [0, 2] are identical (hash: sha256:a1b2c3d4...)
Agent State Sync
Sync state across distributed agent processes.
import glyph
from glyph import stream
class AgentStateSyncer :
def __init__ ( self , writer , reader ):
self .writer = writer
self .reader = reader
self .local_state = {}
self .local_hash = None
def update_local ( self , changes : dict ):
"""Update local state and broadcast patch."""
# Compute patch
patch_ops = [( "=" , k, v) for k, v in changes.items()]
patch = glyph.patch(patch_ops)
# Send with current state hash
self .writer.write_frame(
sid = 1 ,
seq = self .next_seq(),
kind = "patch" ,
payload = patch,
base = self .local_hash[: 16 ] if self .local_hash else None ,
)
# Apply locally
self .local_state.update(changes)
self .local_hash = glyph.fingerprint_loose( self .local_state)
def sync_from_remote ( self , frame ):
"""Sync state from remote agent."""
if frame.kind == "patch" :
# Verify base hash
if frame.base and self .local_hash:
if not self .local_hash.startswith(frame.base):
# State diverged, request full sync
self .request_full_state()
return
# Apply patch
patch = glyph.parse_patch(frame.payload)
self .local_state = glyph.apply_patch( self .local_state, patch)
self .local_hash = glyph.fingerprint_loose( self .local_state)
elif frame.kind == "doc" :
# Full state sync
self .local_state = glyph.parse(frame.payload)
self .local_hash = glyph.fingerprint_loose( self .local_state)
Implementation Details
Canonicalization Mode
Sender and receiver MUST agree on canonicalization mode (Strict vs Loose).
Loose mode (most common): Schema-optional, JSON-compatible
Strict mode : Schema-required, packed encoding
Mixing modes produces different hashes for the same logical data.
Short Hashes
For space efficiency, use the first 16 hex characters (64 bits) of the hash:
full_hash = "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890"
short_hash = full_hash[ 7 : 23 ] # "a1b2c3d4e5f67890"
Collision probability with 64-bit prefix:
1 million hashes: ~0.00003% chance of collision
1 billion hashes: ~27% chance of collision
Use full 256-bit hash for high-security applications.
Cross-Language Consistency
GLYPH guarantees byte-identical canonical forms across Go, Python, JavaScript, and Rust.
Test case:
# Python
data = { "user" : "alice" , "count" : 42 }
hash_py = glyph.fingerprint_loose(data)
# sha256:a1b2c3d4...
// Go
data := map [ string ] interface {}{ "user" : "alice" , "count" : 42 }
hashGo := glyph . FingerprintLoose ( glyph . FromJSONLoose ( data ))
// sha256:a1b2c3d4...
// TypeScript
const data = { user: 'alice' , count: 42 };
const hashTS = fingerprintLoose ( data );
// sha256:a1b2c3d4...
Result: hash_py === hashGo === hashTS ✓
Best Practices
Always Verify on Critical Paths
For checkpoints, distributed state, or financial data, always verify fingerprints before applying changes.
Use Short Hashes for Efficiency
For logs, debug output, or low-risk scenarios, use 16-char short hashes to save space.
Include a version field in cache key data to invalidate caches after schema changes. cache_key_data = {
"prompt" : prompt,
"model" : model,
"version" : "v2" , # Bump to invalidate old caches
}
Include state hashes in logs to trace state evolution: logger.info( f "State updated: { short_hash } -> { new_short_hash } " )
Next Steps
Patches Use fingerprints for safe patch application
GS1 Streaming Leverage base hashes in streaming protocol
Loose Mode Understand canonical form rules
Agent Patterns Apply fingerprinting in agent systems