What are Merkle Trees?
Merkle trees are cryptographic data structures that enable efficient verification of large datasets. Light Protocol uses specialized Merkle trees to store compressed account hashes on-chain while keeping the full data off-chain.Tree Properties
Binary Structure
Each node has exactly two children:- Parent hash =
H(left_child || right_child) - Root hash = Unique fingerprint of entire tree
- Leaf hash = Compressed account hash
Sparse Trees
Light Protocol uses sparse Merkle trees:- Most leaves are zero (uninitialized)
- Only non-zero leaves are stored
- Tree can be very large (2^26 capacity = 67M leaves)
- Storage only grows with used leaves
Poseidon Hash Function
All hashing uses Poseidon:- ZK-friendly (efficient in circuits)
- Field arithmetic in BN254 curve
- Faster than SHA-256 for ZK proofs
- Resistant to collision attacks
Poseidon is specifically designed for zero-knowledge applications, making proof generation 10-100x faster than using SHA-256.
Tree Types in Light Protocol
Light Protocol uses three types of Merkle trees:State Merkle Trees
Store compressed account hashes for state data. Characteristics:- Binary concurrent Merkle trees
- Height: 26 (default) = 67,108,864 capacity
- Each leaf contains a compressed account hash
- Supports nullification (marking leaves as spent)
- Associated with output queue for new accounts
- Compressed token accounts
- Compressed PDA accounts
- Any mutable state
Address Merkle Trees
Manage unique addresses across a 254-bit address space. Characteristics:- Indexed Merkle trees (sorted by value)
- Store address ranges as linked lists
- Enable non-inclusion proofs (address doesn’t exist)
- Height: 40 (default) = huge capacity
- Start with 1 initialized element (not 0)
- Compressed PDA derivation
- Token mint addresses
- Unique identifier management
Output State Trees
(Less commonly referenced - generally referred to as output queues paired with state trees)Batched Merkle Tree Account
Light Protocol implements batched updates for efficiency:Root History
Trees maintain a circular buffer of historical roots: Purpose:- Allow transactions to reference recent roots
- Enable concurrent transactions
- Provide grace period for proof generation
- Typically 100-1000 roots
- Configurable at tree creation
- Older roots overwritten cyclically
Transactions must use a root that’s still in the history. If a root is too old (overwritten), the transaction will fail.
Tree Operations
Initialization
- State trees: All leaves are zero
- Address trees: Contains one initialized element
- Root history: Filled with initial root
- Next index: 0 (state) or 1 (address)
Appending (New Leaves)
Add new compressed account hashes to the tree:- Client creates account → Hash inserted into output queue
- Queue batch fills → Forester requests ZK proof
- Prover generates proof → Proves correct hash chain
- Forester submits batch → Updates tree with new root
- Old root is valid (in root history)
- Hash chain correctly computed
- New leaves appended at correct positions
- New root correctly computed
Nullifying (Update Leaves)
Mark existing leaves as spent:- Client updates account → Old hash inserted into nullifier queue
- Queue batch fills → Forester requests ZK proof
- Prover generates proof → Proves correct nullification
- Forester submits batch → Overwrites leaves with nullifiers
Why Include Transaction Hash in Nullifier?
Why Include Transaction Hash in Nullifier?
The transaction hash makes each nullifier unique:
- Same account can be nullified multiple times (different txs)
- Prevents replays of nullification transactions
- Bloom filter stores account hash, not nullifier (for double-spend prevention)
- Tree stores nullifier to overwrite the original hash
Reading Roots
Merkle Proofs
To prove a leaf is in the tree, provide its Merkle path:Queue System
Trees have associated queues for batching:Input Queue (Nullifier Queue)
- Bloom filters for fast non-inclusion checks
- Hash chains for batch proof generation
- Double buffering (fill one while updating from other)
Output Queue
- Stores full account hashes
- Enables proof-by-index before tree update
- Hash chains for batch proof generation
Batch Lifecycle
Bloom Filters
Used in input queues to prevent double-spending: Structure:- Prevent double-spending within batch
- Fast O(1) checks
- Probabilistic (false positives possible, no false negatives)
- Must be zeroed when batch is cleared
Bloom filters provide an efficient first line of defense against double-spending. The actual nullification in the tree provides definitive proof.
Hash Chains
Used to commit to a batch of values: Construction:- Single hash commits to entire batch
- Used as public input to ZK circuit
- Prover must know all values to compute chain
- Verifier only needs final hash
- Typical: 100-500 values per hash chain
- Multiple hash chains per batch (batch_size / zkp_batch_size)
- Each hash chain gets its own ZK proof
Tree Capacity and Rollover
Capacity
Rollover Process
When a tree reaches capacity:- New tree created with same configuration
- Linked via metadata (next_merkle_tree field)
- New accounts directed to new tree
- Old tree remains readable forever
- Rollover fee amortized across all accounts
Indexed Merkle Trees (Address Trees)
Special tree type for managing unique addresses:Structure
Properties
- Sorted by value: Enables range proofs
- Linked list: Each leaf points to next
- Non-inclusion proofs: Prove address not in tree
- Efficient updates: Only touch neighboring leaves
Non-Inclusion Proof
To prove an address doesn’t exist:- Find leaf with value just below target
- Check that leaf’s next_value is above target
- Verify both leaves are in tree (Merkle proofs)
- Result: Target address is in the gap
Tree Account Size
Calculating on-chain storage requirements:Best Practices
Choosing Tree Height
Root History Size
Batch Sizes
Next Steps
State Trees
Learn about state tree management
Compressed Accounts
Understand compressed account model
Forester Guide
Run a forester to maintain trees
Tree Configuration
Configure trees for your application