Skip to main content

Persistent Storage

The RotorTree type adds write-ahead logging (WAL), checkpointing, and optional memory-mapped storage tiers to LeanIMT. This enables crash recovery while maintaining high throughput.

Enabling Storage

Add the storage feature to your Cargo.toml:
[dependencies]
rotortree = { version = "0.15", features = ["storage", "blake3"] }
The storage feature automatically includes parking_lot, arc-swap, wincode, crc-fast, fs4, and memmap2 dependencies.

Opening a Tree

Create a RotorTree with configuration:
use rotortree::{
    Blake3Hasher, RotorTree, RotorTreeConfig,
    FlushPolicy, CheckpointPolicy, TieringConfig,
};
use std::path::PathBuf;

let config = RotorTreeConfig {
    path: PathBuf::from("/tmp/my-tree"),
    flush_policy: FlushPolicy::default(), // fsync every 10ms
    checkpoint_policy: CheckpointPolicy::default(), // manual
    tiering: TieringConfig::default(), // all in memory
    verify_checkpoint: true, // recompute root on recovery
};

// Opens existing WAL or creates a new one
let tree = RotorTree::<Blake3Hasher, 4, 20>::open(Blake3Hasher, config).unwrap();

Configuration Options

Flush Policy

Controls when buffered WAL entries are fsynced to disk:
// Fsync on a background thread every 10ms
FlushPolicy::Interval(std::time::Duration::from_millis(10))
With FlushPolicy::Manual, you must call tree.flush() periodically or data will remain in memory. Unflushed data is lost on crash.

Checkpoint Policy

Checkpoints materialize WAL state to data files, allowing WAL truncation:
// Caller calls tree.checkpoint() explicitly
CheckpointPolicy::Manual

Tiering Configuration

Controls which tree levels are kept in memory vs memory-mapped:
use rotortree::TieringConfig;

// Default: mmap everything after checkpoint
let tiering = TieringConfig::default();
assert_eq!(tiering.pin_above_level, usize::MAX);

// Keep everything in memory (no mmap)
let tiering = TieringConfig { pin_above_level: 0 };

// Mmap levels 0-9, keep level 10+ in memory
let tiering = TieringConfig { pin_above_level: 10 };
Levels below pin_above_level are mmap’d after checkpoint. Higher levels (closer to root) remain in memory for fast access.

Inserting with Durability

RotorTree provides two insertion modes:
Returns immediately with a durability token:
let (root, token) = tree.insert([42u8; 32]).unwrap();

// Do other work...

// Wait for fsync to complete
token.wait();
The background flush thread will fsync according to FlushPolicy. The token allows you to wait for durability when needed.

Batch Inserts

Batch operations work the same way:
let leaves = vec![[1u8; 32]; 500];
let (root, token) = tree.insert_many(&leaves).unwrap();

// Option 1: Wait for durability
token.wait();

// Option 2: Continue and check later
tree.insert([2u8; 32]).unwrap();
token.wait(); // Both insertions are now durable

Snapshots and Proofs

Snapshots work identically to in-memory trees:
use rotortree::TreeHasher;

// Lock-free snapshot
let snap = tree.snapshot();

// Generate proof (same as LeanIMT)
let proof = snap.generate_proof(0).unwrap();
let th = TreeHasher::new(Blake3Hasher);
assert!(proof.verify(&th).unwrap());
Snapshots are copy-on-write and don’t block insertions. Proof generation from snapshots is always lock-free.

Checkpointing

Checkpoints write tree state to data files:
// Explicit checkpoint
tree.checkpoint().unwrap();

// Or let the policy handle it automatically
let config = RotorTreeConfig {
    path: PathBuf::from("/tmp/my-tree"),
    flush_policy: FlushPolicy::Manual,
    checkpoint_policy: CheckpointPolicy::MemoryThreshold(256 * 1024 * 1024),
    tiering: TieringConfig::default(),
    verify_checkpoint: false,
};

Checkpoint Structure

Checkpoints create a directory structure:
/tmp/my-tree/
  wal.log           # Write-ahead log
  wal.lock          # File lock
  data/
    header.bin      # Tree metadata (N, MAX_DEPTH, CHUNK_SIZE)
    checkpoint.meta # Latest checkpoint metadata
    level_0/        # Leaf level
      shard_0000.bin
      shard_0001.bin
    level_1/        # First parent level
      shard_0000.bin
    ...
Each shard file contains up to 65,536 chunks. Each chunk is 128 hashes × 32 bytes = 4 KB.

Closing the Tree

// Explicit flush and close
tree.flush().unwrap();
tree.close().unwrap();

// Or just drop (also flushes + releases lock)
drop(tree);
Always use graceful shutdown in production. While Drop flushes the WAL, checkpoint-on-close only runs if you call close() explicitly.

Recovery

RotorTree automatically recovers on open():
let tree = RotorTree::<Blake3Hasher, 4, 20>::open(Blake3Hasher, config).unwrap();
// If WAL exists, it's replayed automatically

Recovery Process

1

Load Checkpoint

If data/checkpoint.meta exists, load the last checkpointed state from shard files.
2

Replay WAL

Replay all WAL entries since the last checkpoint sequence number.
3

Verify Root (Optional)

If verify_checkpoint: true, recompute the Merkle root to detect corruption beyond CRC.
4

Truncate WAL

Truncate the WAL to remove any partial entries from a crash.
Root Verification: Setting verify_checkpoint: true adds recovery time (must hash all leaves) but provides strong corruption detection. Disable for faster recovery if you trust CRC protection.

Complete Example: Bulk Load

Here’s a realistic example from the repository:
use rotortree::{
    Blake3Hasher, RotorTree, RotorTreeConfig, TreeHasher,
    FlushPolicy, CheckpointPolicy, TieringConfig,
};
use std::{
    path::PathBuf,
    time::{Duration, Instant},
};

const N: usize = 4;
const MAX_DEPTH: usize = 14;

fn main() {
    let config = RotorTreeConfig {
        path: PathBuf::from(".db"),
        flush_policy: FlushPolicy::Interval(Duration::from_millis(10)),
        checkpoint_policy: CheckpointPolicy::MemoryThreshold(256 * 1024 * 1024),
        tiering: TieringConfig::default(),
        verify_checkpoint: false,
    };
    
    let tree = RotorTree::<Blake3Hasher, N, MAX_DEPTH>::open(
        Blake3Hasher,
        config
    ).unwrap();
    
    // Insert 1M leaves in chunks
    let mut inserted = 0u64;
    let chunk_size = 10_000;
    
    while inserted < 1_000_000 {
        let leaves: Vec<[u8; 32]> = (inserted..inserted + chunk_size)
            .map(|i| {
                let mut h = [0u8; 32];
                h[..8].copy_from_slice(&i.to_le_bytes());
                h
            })
            .collect();
        
        let start = Instant::now();
        let (_root, token) = tree.insert_many(&leaves).unwrap();
        let insert_time = start.elapsed();
        
        token.wait();
        let durable_time = start.elapsed();
        
        inserted += chunk_size;
        
        println!(
            "{} leaves: insert {:.2}ms, durable {:.2}ms",
            inserted,
            insert_time.as_secs_f64() * 1000.0,
            durable_time.as_secs_f64() * 1000.0
        );
    }
    
    // Generate proof
    let snap = tree.snapshot();
    let proof = snap.generate_proof(500_000).unwrap();
    assert!(proof.verify(&TreeHasher::new(Blake3Hasher)).unwrap());
    
    tree.close().unwrap();
    std::fs::remove_dir_all(".db").unwrap();
}

Performance Characteristics

From benchmarks on M4 Pro (14 cores, 48 GB):
  • 1B leaves inserted in 1M chunks
  • Manual checkpoint after each chunk
  • Proof generation remains constant (~50μs)
  • Verification is always ~10μs
Proof generation time depends on snapshot acquisition cost. With mmap, cold levels are read from disk, adding latency. Pin hot levels in memory for consistent proof generation performance.

Next Steps

Proof Generation

Learn about inclusion proofs, consistency proofs, and proof updates

Performance Tuning

Optimize checkpoint policies, tiering, and parallelism for your workload

In-Memory Trees

Compare with the faster in-memory-only LeanIMT

Feature Flags

Explore additional features like parallel insertion and wincode serialization

Build docs developers (and LLMs) love