Overview
RotorTree uses a sophisticated chunked storage system with structural sharing to enable efficient snapshots and memory management. This design allows lock-free concurrent reads while minimizing memory allocation overhead.The Problem
Naive Merkle tree implementations face challenges:- Snapshots are expensive: Copying the entire tree for each snapshot is prohibitive
- Memory overhead: Each tree node requires a separate allocation
- Cache efficiency: Random access patterns hurt CPU cache performance
- Concurrent access: Shared mutable state requires locking
Three-Tier Architecture
Tier 1: Tail Buffer
A fixed-size mutable array for recently inserted hashes. Source:src/tree.rs:170
- Capacity:
CHUNK_SIZE(default: 128 hashes = 4 KB) - Behavior: Fills from left to right
- Promotion: When full, promoted to a
Chunkin the pending buffer
Tier 2: Pending Chunks
A growable vector of committed chunks not yet frozen into a segment. Source:src/tree.rs:167
- Capacity: Up to
CHUNKS_PER_SEGMENT - 1(default: 255 chunks) - Behavior: Grows as tail buffers are promoted
- Freezing: When full, all chunks are frozen into an immutable segment
Tier 3: Immutable Segments
Arc-shared arrays of frozen chunks, never modified after creation. Source:src/tree.rs:165
- Size: Each segment contains exactly
CHUNKS_PER_SEGMENT(default: 256) chunks - Sharing: Multiple snapshots can share the same segments via
Arc - Immutability: Once created, never modified (copy-on-write if needed)
Chunks: The Core Unit
AChunk is an Arc-wrapped array of hashes, enabling structural sharing.
Without Storage Feature
Source:src/tree.rs:42-66
- Memory footprint: 128 hashes × 32 bytes = 4,096 bytes per chunk
- Arc overhead: 8 bytes (pointer) per reference
- Copy-on-write:
Arc::make_mut()clones only if shared
With Storage Feature
Source:src/tree.rs:68-132
With the storage feature, chunks can be backed by memory-mapped files:
- Memory chunks: Allocated in RAM, same as without storage
- Mapped chunks: Read directly from mmap’d data files
- Transparent access: Same
as_slice()interface for both - COW on write: Mapped chunks copy to memory if modified
Constants
Fromsrc/tree.rs:21-28:
| Constant | Value | Rationale |
|---|---|---|
CHUNK_SIZE | 128 | 128 × 32 bytes = 4 KB (typical page size) |
CHUNKS_PER_SEGMENT | 256 | 256 × 4 KB = 1 MB segment granularity |
- Cache efficiency: 4 KB chunks fit in L1/L2 cache
- Arc granularity: 1 MB segments reduce Arc overhead
- Snapshot cost: Small enough for cheap clones
ChunkedLevel Operations
Reading: get() and get_group()
Source: src/tree.rs:214-249
Writing: set() and extend()
Source: src/tree.rs:252-343
extend() bypasses the tail buffer for full chunks, creating them directly.
Promotion and Freezing
Source:src/tree.rs:382-407
- Tail fills up → promote to pending
- Pending fills up → freeze to segment
- Segment created → immutable forever
Structural Sharing
Snapshots
Source:src/tree.rs:674-686
- Segments: Only the
Arcis cloned (cheap pointer bump) - Pending chunks: Each
Chunkis cloned (Arc pointer bump per chunk) - Tail buffer: Copied entirely (128 hashes = 4 KB)
- Level 0: ~7,813 chunks
- Segments: ~30 segments × 8 bytes = 240 bytes
- Pending: ~0-255 chunks × 8 bytes ≈ 2 KB
- Tail: 4 KB
- Levels 1-9: Similar but smaller
Copy-on-Write Semantics
Source:src/tree.rs:53-55 and src/tree.rs:96-105
- If chunk is in a frozen segment (Arc refcount > 1),
Arc::make_mut()clones it - If chunk is memory-mapped, always copies to RAM
- Otherwise, returns mutable reference (no copy)
Memory Layout Example
For a tree with 100,000 leaves at level 0:Chunk Access Patterns
Sequential Scan
Group Access (Common in Tree Operations)
Random Access
Trade-offs
Chunk Size
Chunk Size
Smaller chunks (e.g., 64):
- ✅ Cheaper snapshots (less tail to copy)
- ✅ Finer-grained Arc sharing
- ❌ More Arc overhead
- ❌ More segment allocations
- ✅ Less Arc overhead
- ✅ Better sequential scan performance
- ❌ More expensive snapshots
- ❌ Wasted space for small trees
Segment Size
Segment Size
Smaller segments (e.g., 128 chunks):
- ✅ Faster freezing
- ❌ More Arc overhead per snapshot
- ✅ Fewer Arc bumps per snapshot
- ❌ Longer pending phase
- ❌ Larger allocation when freezing
Copy-on-Write Behavior
Copy-on-Write Behavior
Aggressive COW (always clone on write to shared chunk):
- ✅ Protects all snapshots
- ❌ High memory usage with many snapshots
- ✅ Lower memory usage
- ❌ Complexity in tracking mutations
Arc::make_mut() provides automatic lazy COW.Testing Structural Sharing
Fromsrc/tree.rs:1122-1136:
Next Steps
Storage Modes
Learn about persistent storage with WAL and checkpointing
Memory Management
Optimize memory usage for your workload
