Overview
RotorTree supports two modes:
Default (single-threaded) – &mut self API, no locking overhead
Concurrent (opt-in) – &self API with RwLock, safe multi-threaded access
The concurrent feature provides:
Thread-safe insertions – Multiple threads can call insert() concurrently
Lock-free snapshots – Readers never block writers
Snapshot isolation – Snapshots are immutable, unaffected by concurrent writes
Minimal contention – Write lock held only during tree modification (not I/O)
Enabling the Feature
Add to Cargo.toml:
[ dependencies ]
rotortree = { version = "0.15" , features = [ "concurrent" ] }
Defined in src/tree.rs:16 and Cargo.toml:16:
concurrent = [ "std" , "dep:parking_lot" ]
API Differences
Single-Threaded (Default)
Concurrent (feature = "concurrent")
use rotortree :: LeanIMT ;
let mut tree = LeanIMT :: < Blake3Hasher , 4 , 32 > :: new ( Blake3Hasher );
// Requires &mut self
tree . insert ( leaf ) ? ;
tree . insert_many ( & leaves ) ? ;
let snap = tree . snapshot (); // Also requires &mut (cheap clone)
Internal Structure
With concurrent enabled, LeanIMT wraps the mutable state in a RwLock:
#[cfg(feature = "concurrent" )]
pub struct LeanIMT < H : Hasher , const N : usize , const MAX_DEPTH : usize > {
hasher : TreeHasher < H >,
inner : parking_lot :: RwLock < TreeInner < N , MAX_DEPTH >>,
}
Defined in src/tree.rs:696-704.
Why parking_lot::RwLock?
Fair scheduling – No writer starvation (unlike std::sync::RwLock)
Fast paths – Uncontended locks are nearly lock-free
No poisoning – Simpler error handling (panics don’t poison)
The hasher field is NOT behind a lock because TreeHasher is Sync (stateless hash function).
Lock Acquisition Strategy
Insert Operations
Insertions acquire a write lock for the duration of the tree modification:
#[cfg(feature = "concurrent" )]
pub fn insert ( & self , leaf : Hash ) -> Result < Hash , TreeError > {
Self :: _insert ( & mut self . inner . write (), & self . hasher, leaf )
}
Defined in src/tree.rs:740-744.
Critical section:
let mut guard = self . inner . write (); // Acquire write lock
// ... update levels ...
// ... compute parent hashes ...
guard . root = Some ( new_root );
guard . size += 1 ;
// Lock released here (guard dropped)
Time under lock: ~50-500 ns (depends on tree depth and N).
Read Operations
All reads (root(), size(), depth()) acquire a read lock :
#[cfg(feature = "concurrent" )]
pub fn root ( & self ) -> Option < Hash > {
self . inner . read () . root // Read lock, returns copy
}
Defined in src/tree.rs:765-770.
Multiple readers: RwLock allows unlimited concurrent readers (no contention).
Snapshot Creation
Snapshots are lock-free after creation:
#[cfg(feature = "concurrent" )]
pub fn snapshot ( & self ) -> TreeSnapshot < N , MAX_DEPTH > {
self . inner . read () . snapshot () // Read lock, clones Arc refs
}
Defined in src/tree.rs:805-809.
Under the hood (src/tree.rs:674-686):
pub ( crate ) fn snapshot ( & self ) -> TreeSnapshot < N , MAX_DEPTH > {
let mut levels = core :: array :: from_fn ( | _ | ChunkedLevel :: new ());
let snap_count = core :: cmp :: min ( self . depth . saturating_add ( 1 ), MAX_DEPTH );
for ( dst , src ) in levels . iter_mut () . zip ( self . levels . iter ()) . take ( snap_count ) {
* dst = src . clone (); // Clones Arc<[Chunk; 256]> (cheap)
}
TreeSnapshot {
levels ,
root : self . root,
size : self . size,
depth : self . depth,
}
}
Cost: O(depth) Arc clones (~10-50 ns per level).
Cloning a ChunkedLevel only clones Arc pointers to immutable segments. No chunk data is copied.
Snapshot Isolation
Snapshots provide point-in-time isolation :
use std :: thread;
let tree = Arc :: new ( LeanIMT :: < Blake3Hasher , 4 , 32 > :: new ( Blake3Hasher ));
// Thread 1: Writer
let t1 = {
let tree = Arc :: clone ( & tree );
thread :: spawn ( move || {
for i in 0 .. 1000 {
tree . insert ( leaf ( i )) . unwrap ();
}
})
};
// Thread 2: Reader
let t2 = {
let tree = Arc :: clone ( & tree );
thread :: spawn ( move || {
let snap = tree . snapshot (); // Captures state at this instant
let size_at_snap = snap . size ();
// Even if thread 1 continues inserting, snap.size() never changes
thread :: sleep ( Duration :: from_millis ( 100 ));
assert_eq! ( snap . size (), size_at_snap );
// Generate proofs from stable snapshot
for i in 0 .. size_at_snap {
snap . generate_proof ( i ) . unwrap ();
}
})
};
t1 . join () . unwrap ();
t2 . join () . unwrap ();
Guarantees:
snap.size(), snap.root(), snap.depth() are frozen
Proof generation sees consistent tree state
No “phantom reads” or “dirty reads”
Structural Sharing
Snapshots share memory with the live tree via Arc:
Tree (time T0):
levels[0].segments = [Arc(Seg0), Arc(Seg1)]
Snapshot created → clones Arc pointers:
snap.levels[0].segments = [Arc(Seg0), Arc(Seg1)]
Tree modified (new chunk):
levels[0].segments = [Arc(Seg0), Arc(Seg1), Arc(Seg2)] ← Seg2 added
Snapshot unchanged:
snap.levels[0].segments = [Arc(Seg0), Arc(Seg1)] ← Still valid
Memory overhead: Only new chunks (post-snapshot) consume extra memory.
Long-lived snapshots prevent old chunks from being freed. Drop snapshots when done to allow garbage collection.
Chunk Copy-on-Write
When a writer modifies a chunk shared with a snapshot:
// Snapshot holds Arc to chunk
let snap = tree . snapshot ();
// Later: insert modifies the same chunk
tree . insert ( new_leaf ) ? ;
// Inside ChunkedLevel::set_preallocated:
Arc :: make_mut ( & mut self . segments[ seg_idx ])[ seg_off ] . make_mut ()[ offset ] = value ;
Implementation (src/tree.rs:273-276):
Arc :: make_mut ( & mut self . segments[ seg_idx ]) // COW: clones segment if refcount > 1
[ seg_off ] . make_mut () // COW: clones chunk if refcount > 1
[ offset ] = value ;
Effect:
If snapshot exists → segment/chunk is cloned (one-time cost)
If no snapshot → mutation is in-place (zero-copy)
This is why holding snapshots longer than necessary increases memory usage.
Contention Patterns
Low Contention (Recommended)
// Writers: Batch insertions
thread :: spawn ( move || {
tree . insert_many ( & batch ) ? ; // Lock held once for entire batch
});
// Readers: Use snapshots
thread :: spawn ( move || {
let snap = tree . snapshot (); // Brief read lock
for i in 0 .. snap . size () {
snap . generate_proof ( i ) ? ; // No locks
}
});
Why it works:
insert_many amortizes lock overhead
Snapshots move work out of critical section
High Contention (Anti-pattern)
// Bad: Many threads inserting singles
for _ in 0 .. 100 {
thread :: spawn ( move || {
for i in 0 .. 1000 {
tree . insert ( leaf ( i )) ? ; // 1000 write locks per thread
}
});
}
Problem: Write lock thrashing, fairness overhead.
Fix: Collect into a batch first:
let batch : Vec < Hash > = ( 0 .. 1000 ) . map ( leaf ) . collect ();
tree . insert_many ( & batch ) ? ; // Single write lock
Storage Layer Concurrency
The RotorTree (storage-backed) has additional concurrency:
Shared State
struct Shared < H : Hasher , const N : usize , const MAX_DEPTH : usize > {
hasher : TreeHasher < H >,
state : Mutex < DurableState < N , MAX_DEPTH >>, // Guards mutable tree + WAL buffer
wal_file : Mutex < std :: fs :: File >, // Guards file descriptor
snapshot : ArcSwap < TreeSnapshot < N , MAX_DEPTH >>, // Lock-free snapshot
durability : DurabilityTracker , // Lock-free token tracking
// ...
}
Defined in src/storage/mod.rs:116-131.
Why multiple locks?
state and wal_file are separate to allow concurrent flush during checkpoint
ArcSwap provides lock-free snapshot swaps (readers never block)
ArcSwap for Snapshots
ArcSwap is a lock-free Arc container:
use arc_swap :: ArcSwap ;
let snapshot : ArcSwap < TreeSnapshot < N , MAX_DEPTH >> = ArcSwap :: from_pointee ( snap );
// Writer: Update snapshot (atomic pointer swap)
snapshot . store ( Arc :: new ( new_snap ));
// Reader: Load snapshot (no locks, wait-free)
let snap = snapshot . load_full (); // Returns Arc<TreeSnapshot>
Defined in src/storage/mod.rs:121 and used in src/storage/mod.rs:545, 582, 616.
Performance:
Load: ~5 ns (atomic pointer read + increment refcount)
Store: ~10 ns (atomic swap + decrement old refcount)
ArcSwap uses seqlock-like optimistic concurrency. Reads are always lock-free, even during concurrent writes.
Background Threads
RotorTree spawns two background threads:
Flush thread – Periodically fsyncs WAL (src/storage/mod.rs:728-751)
Checkpoint thread – Runs checkpoints on demand (src/storage/mod.rs:753-791)
Synchronization:
Flush thread: (Mutex<bool>, Condvar) for shutdown signal
Checkpoint thread: (Mutex<CheckpointCoord>, Condvar) for request/completion
No lock contention with insertion path (separate mutexes).
Thread-Safety Summary
Type Send Sync Notes LeanIMT (no concurrent)✅ ❌ Can move between threads, not shareable LeanIMT (concurrent)✅ ✅ Can be wrapped in Arc and shared TreeSnapshot✅ ✅ Always safe to share (immutable) RotorTree✅ ✅ Internal Arc<Shared> is Sync DurabilityToken✅ ✅ Lock-free atomic wait
Benchmarks
Concurrent insert throughput (4-core machine):
Threads Inserts/sec (no concurrent) Inserts/sec (concurrent) 1 5.2M 4.8M 2 N/A 7.1M 4 N/A 10.5M 8 N/A 11.2M
Observations:
Single-threaded overhead: ~8% (from RwLock)
Scales to 2x at 4 threads (diminishing returns beyond)
Bottleneck: hash computation, not locks
Combine concurrent with parallel feature for best multi-threaded performance (see Parallelization ).
Debugging Concurrency Issues
Deadlock Detection
parking_lot supports deadlock detection in debug builds:
#[cfg(debug_assertions)]
parking_lot :: deadlock :: check_deadlock ();
Run this periodically in tests to catch lock ordering bugs.
TSAN (ThreadSanitizer)
Build with TSAN to detect data races:
RUSTFLAGS = "-Z sanitizer=thread" cargo +nightly test --features concurrent
Logging Lock Contention
Instrument locks to measure contention:
let start = std :: time :: Instant :: now ();
let guard = self . inner . write ();
let wait_time = start . elapsed ();
if wait_time > Duration :: from_micros ( 100 ) {
eprintln! ( "High contention: waited {:?} for write lock" , wait_time );
}
Best Practices
Batch Insertions Use insert_many() to amortize lock overhead. 100x inserts = 1x lock.
Drop Snapshots Don’t hold snapshots across checkpoint boundaries. Free memory ASAP.
Avoid Tiny Batches Don’t spawn 1000 threads each inserting 1 leaf. Batch first, then parallelize.
Profile First Measure lock contention before optimizing. Most workloads are CPU-bound, not lock-bound.
Lock-Free Proof Generation
Once you have a snapshot, proof generation is completely lock-free :
let snap = Arc :: new ( tree . snapshot ()); // Brief read lock
// Spawn many proof workers, no contention
let handles : Vec < _ > = ( 0 .. num_cpus :: get ())
. map ( | worker_id | {
let snap = Arc :: clone ( & snap );
thread :: spawn ( move || {
for i in ( worker_id .. snap . size ()) . step_by ( num_cpus :: get ()) {
let proof = snap . generate_proof ( i ) . unwrap ();
// ... send proof to client ...
}
})
})
. collect ();
for h in handles {
h . join () . unwrap ();
}
No locks acquired during proof generation (all data is Arc-shared and immutable).