Skip to main content

Overview

RotorTree supports two modes:
  1. Default (single-threaded)&mut self API, no locking overhead
  2. Concurrent (opt-in)&self API with RwLock, safe multi-threaded access
The concurrent feature provides:
  • Thread-safe insertions – Multiple threads can call insert() concurrently
  • Lock-free snapshots – Readers never block writers
  • Snapshot isolation – Snapshots are immutable, unaffected by concurrent writes
  • Minimal contention – Write lock held only during tree modification (not I/O)

Enabling the Feature

Add to Cargo.toml:
[dependencies]
rotortree = { version = "0.15", features = ["concurrent"] }
Defined in src/tree.rs:16 and Cargo.toml:16:
concurrent = ["std", "dep:parking_lot"]

API Differences

use rotortree::LeanIMT;

let mut tree = LeanIMT::<Blake3Hasher, 4, 32>::new(Blake3Hasher);

// Requires &mut self
tree.insert(leaf)?;
tree.insert_many(&leaves)?;

let snap = tree.snapshot();  // Also requires &mut (cheap clone)

Internal Structure

With concurrent enabled, LeanIMT wraps the mutable state in a RwLock:
#[cfg(feature = "concurrent")]
pub struct LeanIMT<H: Hasher, const N: usize, const MAX_DEPTH: usize> {
    hasher: TreeHasher<H>,
    inner: parking_lot::RwLock<TreeInner<N, MAX_DEPTH>>,
}
Defined in src/tree.rs:696-704. Why parking_lot::RwLock?
  • Fair scheduling – No writer starvation (unlike std::sync::RwLock)
  • Fast paths – Uncontended locks are nearly lock-free
  • No poisoning – Simpler error handling (panics don’t poison)
The hasher field is NOT behind a lock because TreeHasher is Sync (stateless hash function).

Lock Acquisition Strategy

Insert Operations

Insertions acquire a write lock for the duration of the tree modification:
#[cfg(feature = "concurrent")]
pub fn insert(&self, leaf: Hash) -> Result<Hash, TreeError> {
    Self::_insert(&mut self.inner.write(), &self.hasher, leaf)
}
Defined in src/tree.rs:740-744. Critical section:
let mut guard = self.inner.write();  // Acquire write lock
// ... update levels ...
// ... compute parent hashes ...
guard.root = Some(new_root);
guard.size += 1;
// Lock released here (guard dropped)
Time under lock: ~50-500 ns (depends on tree depth and N).

Read Operations

All reads (root(), size(), depth()) acquire a read lock:
#[cfg(feature = "concurrent")]
pub fn root(&self) -> Option<Hash> {
    self.inner.read().root  // Read lock, returns copy
}
Defined in src/tree.rs:765-770. Multiple readers: RwLock allows unlimited concurrent readers (no contention).

Snapshot Creation

Snapshots are lock-free after creation:
#[cfg(feature = "concurrent")]
pub fn snapshot(&self) -> TreeSnapshot<N, MAX_DEPTH> {
    self.inner.read().snapshot()  // Read lock, clones Arc refs
}
Defined in src/tree.rs:805-809. Under the hood (src/tree.rs:674-686):
pub(crate) fn snapshot(&self) -> TreeSnapshot<N, MAX_DEPTH> {
    let mut levels = core::array::from_fn(|_| ChunkedLevel::new());
    let snap_count = core::cmp::min(self.depth.saturating_add(1), MAX_DEPTH);
    for (dst, src) in levels.iter_mut().zip(self.levels.iter()).take(snap_count) {
        *dst = src.clone();  // Clones Arc<[Chunk; 256]> (cheap)
    }
    TreeSnapshot {
        levels,
        root: self.root,
        size: self.size,
        depth: self.depth,
    }
}
Cost: O(depth) Arc clones (~10-50 ns per level).
Cloning a ChunkedLevel only clones Arc pointers to immutable segments. No chunk data is copied.

Snapshot Isolation

Snapshots provide point-in-time isolation:
use std::thread;

let tree = Arc::new(LeanIMT::<Blake3Hasher, 4, 32>::new(Blake3Hasher));

// Thread 1: Writer
let t1 = {
    let tree = Arc::clone(&tree);
    thread::spawn(move || {
        for i in 0..1000 {
            tree.insert(leaf(i)).unwrap();
        }
    })
};

// Thread 2: Reader
let t2 = {
    let tree = Arc::clone(&tree);
    thread::spawn(move || {
        let snap = tree.snapshot();  // Captures state at this instant
        let size_at_snap = snap.size();
        
        // Even if thread 1 continues inserting, snap.size() never changes
        thread::sleep(Duration::from_millis(100));
        assert_eq!(snap.size(), size_at_snap);
        
        // Generate proofs from stable snapshot
        for i in 0..size_at_snap {
            snap.generate_proof(i).unwrap();
        }
    })
};

t1.join().unwrap();
t2.join().unwrap();
Guarantees:
  • snap.size(), snap.root(), snap.depth() are frozen
  • Proof generation sees consistent tree state
  • No “phantom reads” or “dirty reads”

Structural Sharing

Snapshots share memory with the live tree via Arc:
Tree (time T0):
  levels[0].segments = [Arc(Seg0), Arc(Seg1)]
  
Snapshot created → clones Arc pointers:
  snap.levels[0].segments = [Arc(Seg0), Arc(Seg1)]
  
Tree modified (new chunk):
  levels[0].segments = [Arc(Seg0), Arc(Seg1), Arc(Seg2)]  ← Seg2 added
  
Snapshot unchanged:
  snap.levels[0].segments = [Arc(Seg0), Arc(Seg1)]  ← Still valid
Memory overhead: Only new chunks (post-snapshot) consume extra memory.
Long-lived snapshots prevent old chunks from being freed. Drop snapshots when done to allow garbage collection.

Chunk Copy-on-Write

When a writer modifies a chunk shared with a snapshot:
// Snapshot holds Arc to chunk
let snap = tree.snapshot();

// Later: insert modifies the same chunk
tree.insert(new_leaf)?;

// Inside ChunkedLevel::set_preallocated:
Arc::make_mut(&mut self.segments[seg_idx])[seg_off].make_mut()[offset] = value;
Implementation (src/tree.rs:273-276):
Arc::make_mut(&mut self.segments[seg_idx])  // COW: clones segment if refcount > 1
    [seg_off].make_mut()                     // COW: clones chunk if refcount > 1
    [offset] = value;
Effect:
  • If snapshot exists → segment/chunk is cloned (one-time cost)
  • If no snapshot → mutation is in-place (zero-copy)
This is why holding snapshots longer than necessary increases memory usage.

Contention Patterns

// Writers: Batch insertions
thread::spawn(move || {
    tree.insert_many(&batch)?;  // Lock held once for entire batch
});

// Readers: Use snapshots
thread::spawn(move || {
    let snap = tree.snapshot();  // Brief read lock
    for i in 0..snap.size() {
        snap.generate_proof(i)?;  // No locks
    }
});
Why it works:
  • insert_many amortizes lock overhead
  • Snapshots move work out of critical section

High Contention (Anti-pattern)

// Bad: Many threads inserting singles
for _ in 0..100 {
    thread::spawn(move || {
        for i in 0..1000 {
            tree.insert(leaf(i))?;  // 1000 write locks per thread
        }
    });
}
Problem: Write lock thrashing, fairness overhead. Fix: Collect into a batch first:
let batch: Vec<Hash> = (0..1000).map(leaf).collect();
tree.insert_many(&batch)?;  // Single write lock

Storage Layer Concurrency

The RotorTree (storage-backed) has additional concurrency:

Shared State

struct Shared<H: Hasher, const N: usize, const MAX_DEPTH: usize> {
    hasher: TreeHasher<H>,
    state: Mutex<DurableState<N, MAX_DEPTH>>,  // Guards mutable tree + WAL buffer
    wal_file: Mutex<std::fs::File>,             // Guards file descriptor
    snapshot: ArcSwap<TreeSnapshot<N, MAX_DEPTH>>,  // Lock-free snapshot
    durability: DurabilityTracker,              // Lock-free token tracking
    // ...
}
Defined in src/storage/mod.rs:116-131. Why multiple locks?
  • state and wal_file are separate to allow concurrent flush during checkpoint
  • ArcSwap provides lock-free snapshot swaps (readers never block)

ArcSwap for Snapshots

ArcSwap is a lock-free Arc container:
use arc_swap::ArcSwap;

let snapshot: ArcSwap<TreeSnapshot<N, MAX_DEPTH>> = ArcSwap::from_pointee(snap);

// Writer: Update snapshot (atomic pointer swap)
snapshot.store(Arc::new(new_snap));

// Reader: Load snapshot (no locks, wait-free)
let snap = snapshot.load_full();  // Returns Arc<TreeSnapshot>
Defined in src/storage/mod.rs:121 and used in src/storage/mod.rs:545, 582, 616. Performance:
  • Load: ~5 ns (atomic pointer read + increment refcount)
  • Store: ~10 ns (atomic swap + decrement old refcount)
ArcSwap uses seqlock-like optimistic concurrency. Reads are always lock-free, even during concurrent writes.

Background Threads

RotorTree spawns two background threads:
  1. Flush thread – Periodically fsyncs WAL (src/storage/mod.rs:728-751)
  2. Checkpoint thread – Runs checkpoints on demand (src/storage/mod.rs:753-791)
Synchronization:
  • Flush thread: (Mutex<bool>, Condvar) for shutdown signal
  • Checkpoint thread: (Mutex<CheckpointCoord>, Condvar) for request/completion
No lock contention with insertion path (separate mutexes).

Thread-Safety Summary

TypeSendSyncNotes
LeanIMT (no concurrent)Can move between threads, not shareable
LeanIMT (concurrent)Can be wrapped in Arc and shared
TreeSnapshotAlways safe to share (immutable)
RotorTreeInternal Arc<Shared> is Sync
DurabilityTokenLock-free atomic wait

Benchmarks

Concurrent insert throughput (4-core machine):
ThreadsInserts/sec (no concurrent)Inserts/sec (concurrent)
15.2M4.8M
2N/A7.1M
4N/A10.5M
8N/A11.2M
Observations:
  • Single-threaded overhead: ~8% (from RwLock)
  • Scales to 2x at 4 threads (diminishing returns beyond)
  • Bottleneck: hash computation, not locks
Combine concurrent with parallel feature for best multi-threaded performance (see Parallelization).

Debugging Concurrency Issues

Deadlock Detection

parking_lot supports deadlock detection in debug builds:
#[cfg(debug_assertions)]
parking_lot::deadlock::check_deadlock();
Run this periodically in tests to catch lock ordering bugs.

TSAN (ThreadSanitizer)

Build with TSAN to detect data races:
RUSTFLAGS="-Z sanitizer=thread" cargo +nightly test --features concurrent

Logging Lock Contention

Instrument locks to measure contention:
let start = std::time::Instant::now();
let guard = self.inner.write();
let wait_time = start.elapsed();
if wait_time > Duration::from_micros(100) {
    eprintln!("High contention: waited {:?} for write lock", wait_time);
}

Best Practices

Batch Insertions

Use insert_many() to amortize lock overhead. 100x inserts = 1x lock.

Drop Snapshots

Don’t hold snapshots across checkpoint boundaries. Free memory ASAP.

Avoid Tiny Batches

Don’t spawn 1000 threads each inserting 1 leaf. Batch first, then parallelize.

Profile First

Measure lock contention before optimizing. Most workloads are CPU-bound, not lock-bound.

Lock-Free Proof Generation

Once you have a snapshot, proof generation is completely lock-free:
let snap = Arc::new(tree.snapshot());  // Brief read lock

// Spawn many proof workers, no contention
let handles: Vec<_> = (0..num_cpus::get())
    .map(|worker_id| {
        let snap = Arc::clone(&snap);
        thread::spawn(move || {
            for i in (worker_id..snap.size()).step_by(num_cpus::get()) {
                let proof = snap.generate_proof(i).unwrap();
                // ... send proof to client ...
            }
        })
    })
    .collect();

for h in handles {
    h.join().unwrap();
}
No locks acquired during proof generation (all data is Arc-shared and immutable).

Build docs developers (and LLMs) love