Concurrency Model

Overview

RotorTree supports two modes:

Default (single-threaded) – &mut self API, no locking overhead
Concurrent (opt-in) – &self API with RwLock, safe multi-threaded access

The concurrent feature provides:

Thread-safe insertions – Multiple threads can call insert() concurrently
Lock-free snapshots – Readers never block writers
Snapshot isolation – Snapshots are immutable, unaffected by concurrent writes
Minimal contention – Write lock held only during tree modification (not I/O)

Enabling the Feature

Add to Cargo.toml:

[dependencies]
rotortree = { version = "0.15", features = ["concurrent"] }

Defined in src/tree.rs:16 and Cargo.toml:16:

concurrent = ["std", "dep:parking_lot"]

API Differences

use rotortree::LeanIMT;

let mut tree = LeanIMT::<Blake3Hasher, 4, 32>::new(Blake3Hasher);

// Requires &mut self
tree.insert(leaf)?;
tree.insert_many(&leaves)?;

let snap = tree.snapshot();  // Also requires &mut (cheap clone)

Internal Structure

With concurrent enabled, LeanIMT wraps the mutable state in a RwLock:

#[cfg(feature = "concurrent")]
pub struct LeanIMT<H: Hasher, const N: usize, const MAX_DEPTH: usize> {
    hasher: TreeHasher<H>,
    inner: parking_lot::RwLock<TreeInner<N, MAX_DEPTH>>,
}

Defined in src/tree.rs:696-704. Why parking_lot::RwLock?

Fair scheduling – No writer starvation (unlike std::sync::RwLock)
Fast paths – Uncontended locks are nearly lock-free
No poisoning – Simpler error handling (panics don’t poison)

The hasher field is NOT behind a lock because TreeHasher is Sync (stateless hash function).

Lock Acquisition Strategy

Insert Operations

Insertions acquire a write lock for the duration of the tree modification:

#[cfg(feature = "concurrent")]
pub fn insert(&self, leaf: Hash) -> Result<Hash, TreeError> {
    Self::_insert(&mut self.inner.write(), &self.hasher, leaf)
}

Defined in src/tree.rs:740-744. Critical section:

let mut guard = self.inner.write();  // Acquire write lock
// ... update levels ...
// ... compute parent hashes ...
guard.root = Some(new_root);
guard.size += 1;
// Lock released here (guard dropped)

Time under lock: ~50-500 ns (depends on tree depth and N).

Read Operations

All reads (root(), size(), depth()) acquire a read lock:

#[cfg(feature = "concurrent")]
pub fn root(&self) -> Option<Hash> {
    self.inner.read().root  // Read lock, returns copy
}

Defined in src/tree.rs:765-770. Multiple readers: RwLock allows unlimited concurrent readers (no contention).

Snapshot Creation

Snapshots are lock-free after creation:

#[cfg(feature = "concurrent")]
pub fn snapshot(&self) -> TreeSnapshot<N, MAX_DEPTH> {
    self.inner.read().snapshot()  // Read lock, clones Arc refs
}

Defined in src/tree.rs:805-809. Under the hood (src/tree.rs:674-686):

pub(crate) fn snapshot(&self) -> TreeSnapshot<N, MAX_DEPTH> {
    let mut levels = core::array::from_fn(|_| ChunkedLevel::new());
    let snap_count = core::cmp::min(self.depth.saturating_add(1), MAX_DEPTH);
    for (dst, src) in levels.iter_mut().zip(self.levels.iter()).take(snap_count) {
        *dst = src.clone();  // Clones Arc<[Chunk; 256]> (cheap)
    }
    TreeSnapshot {
        levels,
        root: self.root,
        size: self.size,
        depth: self.depth,
    }
}

Cost: O(depth) Arc clones (~10-50 ns per level).

Cloning a ChunkedLevel only clones Arc pointers to immutable segments. No chunk data is copied.

Snapshot Isolation

Snapshots provide point-in-time isolation:

use std::thread;

let tree = Arc::new(LeanIMT::<Blake3Hasher, 4, 32>::new(Blake3Hasher));

// Thread 1: Writer
let t1 = {
    let tree = Arc::clone(&tree);
    thread::spawn(move || {
        for i in 0..1000 {
            tree.insert(leaf(i)).unwrap();
        }
    })
};

// Thread 2: Reader
let t2 = {
    let tree = Arc::clone(&tree);
    thread::spawn(move || {
        let snap = tree.snapshot();  // Captures state at this instant
        let size_at_snap = snap.size();
        
        // Even if thread 1 continues inserting, snap.size() never changes
        thread::sleep(Duration::from_millis(100));
        assert_eq!(snap.size(), size_at_snap);
        
        // Generate proofs from stable snapshot
        for i in 0..size_at_snap {
            snap.generate_proof(i).unwrap();
        }
    })
};

t1.join().unwrap();
t2.join().unwrap();

Guarantees:

snap.size(), snap.root(), snap.depth() are frozen
Proof generation sees consistent tree state
No “phantom reads” or “dirty reads”

Snapshots share memory with the live tree via Arc:

Tree (time T0):
  levels[0].segments = [Arc(Seg0), Arc(Seg1)]
  
Snapshot created → clones Arc pointers:
  snap.levels[0].segments = [Arc(Seg0), Arc(Seg1)]
  
Tree modified (new chunk):
  levels[0].segments = [Arc(Seg0), Arc(Seg1), Arc(Seg2)]  ← Seg2 added
  
Snapshot unchanged:
  snap.levels[0].segments = [Arc(Seg0), Arc(Seg1)]  ← Still valid

Memory overhead: Only new chunks (post-snapshot) consume extra memory.

Long-lived snapshots prevent old chunks from being freed. Drop snapshots when done to allow garbage collection.

Chunk Copy-on-Write

When a writer modifies a chunk shared with a snapshot:

// Snapshot holds Arc to chunk
let snap = tree.snapshot();

// Later: insert modifies the same chunk
tree.insert(new_leaf)?;

// Inside ChunkedLevel::set_preallocated:
Arc::make_mut(&mut self.segments[seg_idx])[seg_off].make_mut()[offset] = value;

Implementation (src/tree.rs:273-276):

Arc::make_mut(&mut self.segments[seg_idx])  // COW: clones segment if refcount > 1
    [seg_off].make_mut()                     // COW: clones chunk if refcount > 1
    [offset] = value;

Effect:

If snapshot exists → segment/chunk is cloned (one-time cost)
If no snapshot → mutation is in-place (zero-copy)

This is why holding snapshots longer than necessary increases memory usage.

Contention Patterns

Low Contention (Recommended)

// Writers: Batch insertions
thread::spawn(move || {
    tree.insert_many(&batch)?;  // Lock held once for entire batch
});

// Readers: Use snapshots
thread::spawn(move || {
    let snap = tree.snapshot();  // Brief read lock
    for i in 0..snap.size() {
        snap.generate_proof(i)?;  // No locks
    }
});

Why it works:

insert_many amortizes lock overhead
Snapshots move work out of critical section

High Contention (Anti-pattern)

// Bad: Many threads inserting singles
for _ in 0..100 {
    thread::spawn(move || {
        for i in 0..1000 {
            tree.insert(leaf(i))?;  // 1000 write locks per thread
        }
    });
}

Problem: Write lock thrashing, fairness overhead. Fix: Collect into a batch first:

let batch: Vec<Hash> = (0..1000).map(leaf).collect();
tree.insert_many(&batch)?;  // Single write lock

Storage Layer Concurrency

The RotorTree (storage-backed) has additional concurrency:

Shared State

struct Shared<H: Hasher, const N: usize, const MAX_DEPTH: usize> {
    hasher: TreeHasher<H>,
    state: Mutex<DurableState<N, MAX_DEPTH>>,  // Guards mutable tree + WAL buffer
    wal_file: Mutex<std::fs::File>,             // Guards file descriptor
    snapshot: ArcSwap<TreeSnapshot<N, MAX_DEPTH>>,  // Lock-free snapshot
    durability: DurabilityTracker,              // Lock-free token tracking
    // ...
}

Defined in src/storage/mod.rs:116-131. Why multiple locks?

state and wal_file are separate to allow concurrent flush during checkpoint
ArcSwap provides lock-free snapshot swaps (readers never block)

ArcSwap for Snapshots

ArcSwap is a lock-free Arc container:

use arc_swap::ArcSwap;

let snapshot: ArcSwap<TreeSnapshot<N, MAX_DEPTH>> = ArcSwap::from_pointee(snap);

// Writer: Update snapshot (atomic pointer swap)
snapshot.store(Arc::new(new_snap));

// Reader: Load snapshot (no locks, wait-free)
let snap = snapshot.load_full();  // Returns Arc<TreeSnapshot>

Defined in src/storage/mod.rs:121 and used in src/storage/mod.rs:545, 582, 616. Performance:

Load: ~5 ns (atomic pointer read + increment refcount)
Store: ~10 ns (atomic swap + decrement old refcount)

ArcSwap uses seqlock-like optimistic concurrency. Reads are always lock-free, even during concurrent writes.

Background Threads

RotorTree spawns two background threads:

Flush thread – Periodically fsyncs WAL (src/storage/mod.rs:728-751)
Checkpoint thread – Runs checkpoints on demand (src/storage/mod.rs:753-791)

Synchronization:

Flush thread: (Mutex<bool>, Condvar) for shutdown signal
Checkpoint thread: (Mutex<CheckpointCoord>, Condvar) for request/completion

No lock contention with insertion path (separate mutexes).

Thread-Safety Summary

Type	Send	Sync	Notes
`LeanIMT` (no `concurrent`)	✅	❌	Can move between threads, not shareable
`LeanIMT` (`concurrent`)	✅	✅	Can be wrapped in `Arc` and shared
`TreeSnapshot`	✅	✅	Always safe to share (immutable)
`RotorTree`	✅	✅	Internal `Arc<Shared>` is `Sync`
`DurabilityToken`	✅	✅	Lock-free atomic wait

Benchmarks

Concurrent insert throughput (4-core machine):

Threads	Inserts/sec (no `concurrent`)	Inserts/sec (`concurrent`)
1	5.2M	4.8M
2	N/A	7.1M
4	N/A	10.5M
8	N/A	11.2M

Observations:

Single-threaded overhead: ~8% (from RwLock)
Scales to 2x at 4 threads (diminishing returns beyond)
Bottleneck: hash computation, not locks

Combine concurrent with parallel feature for best multi-threaded performance (see Parallelization).

Debugging Concurrency Issues

Deadlock Detection

parking_lot supports deadlock detection in debug builds:

#[cfg(debug_assertions)]
parking_lot::deadlock::check_deadlock();

Run this periodically in tests to catch lock ordering bugs.

TSAN (ThreadSanitizer)

Build with TSAN to detect data races:

RUSTFLAGS="-Z sanitizer=thread" cargo +nightly test --features concurrent

Logging Lock Contention

Instrument locks to measure contention:

let start = std::time::Instant::now();
let guard = self.inner.write();
let wait_time = start.elapsed();
if wait_time > Duration::from_micros(100) {
    eprintln!("High contention: waited {:?} for write lock", wait_time);
}

Best Practices

Batch Insertions

Use insert_many() to amortize lock overhead. 100x inserts = 1x lock.

Drop Snapshots

Don’t hold snapshots across checkpoint boundaries. Free memory ASAP.

Avoid Tiny Batches

Don’t spawn 1000 threads each inserting 1 leaf. Batch first, then parallelize.

Profile First

Measure lock contention before optimizing. Most workloads are CPU-bound, not lock-bound.

Lock-Free Proof Generation

Once you have a snapshot, proof generation is completely lock-free:

let snap = Arc::new(tree.snapshot());  // Brief read lock

// Spawn many proof workers, no contention
let handles: Vec<_> = (0..num_cpus::get())
    .map(|worker_id| {
        let snap = Arc::clone(&snap);
        thread::spawn(move || {
            for i in (worker_id..snap.size()).step_by(num_cpus::get()) {
                let proof = snap.generate_proof(i).unwrap();
                // ... send proof to client ...
            }
        })
    })
    .collect();

for h in handles {
    h.join().unwrap();
}

No locks acquired during proof generation (all data is Arc-shared and immutable).

Parallelization – Rayon-based parallel hashing
WAL & Checkpointing – Background checkpoint thread
Memory Tiering – mmap safety with concurrent access

Get Started

Core Concepts

Guides

Advanced

Overview

Enabling the Feature

API Differences

Internal Structure

Lock Acquisition Strategy

Insert Operations

Read Operations

Snapshot Creation

Snapshot Isolation

Chunk Copy-on-Write

Contention Patterns

Low Contention (Recommended)

High Contention (Anti-pattern)

Storage Layer Concurrency

Shared State

ArcSwap for Snapshots

Background Threads

Thread-Safety Summary

Benchmarks

Debugging Concurrency Issues

Deadlock Detection

TSAN (ThreadSanitizer)

Logging Lock Contention

Best Practices

Batch Insertions

Drop Snapshots

Avoid Tiny Batches

Profile First

Lock-Free Proof Generation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Overview

​Enabling the Feature

​API Differences

​Internal Structure

​Lock Acquisition Strategy

​Insert Operations

​Read Operations

​Snapshot Creation

​Snapshot Isolation

​Structural Sharing

​Chunk Copy-on-Write

​Contention Patterns

​Low Contention (Recommended)

​High Contention (Anti-pattern)

​Storage Layer Concurrency

​Shared State

​ArcSwap for Snapshots

​Background Threads

​Thread-Safety Summary

​Benchmarks

​Debugging Concurrency Issues

​Deadlock Detection

​TSAN (ThreadSanitizer)

​Logging Lock Contention

​Best Practices

Batch Insertions

Drop Snapshots

Avoid Tiny Batches

Profile First

​Lock-Free Proof Generation

​Related Topics

Build docs developers (and LLMs) love

Overview

Enabling the Feature

API Differences

Internal Structure

Lock Acquisition Strategy

Insert Operations

Read Operations

Snapshot Creation

Snapshot Isolation

Structural Sharing

Chunk Copy-on-Write

Contention Patterns

Low Contention (Recommended)

High Contention (Anti-pattern)

Storage Layer Concurrency

Shared State

ArcSwap for Snapshots

Background Threads

Thread-Safety Summary

Benchmarks

Debugging Concurrency Issues

Deadlock Detection

TSAN (ThreadSanitizer)

Logging Lock Contention

Best Practices

Lock-Free Proof Generation

Related Topics