If num_parents >= parallel_threshold() → parallelize
Otherwise → sequential fallback
Tuning:
# Lower threshold for more parallelism (on high-core machines)export ROTORTREE_PARALLEL_THRESHOLD=512# Higher threshold to reduce overhead (on laptops)export ROTORTREE_PARALLEL_THRESHOLD=4096
The default threshold (1024) is tuned for 4-8 core machines. Adjust based on core count and batch size.
Without parallel, insert_many computes parent hashes sequentially:
for level in 0..depth { let num_parents = level_len.div_ceil(N); for parent_idx in 0..num_parents { let parent = compute_parent(child_level, parent_idx)?; levels[level + 1].set(parent_idx, parent); }}
With parallel, parent computation is split across threads:
for level in 0..depth { let num_parents = level_len.div_ceil(N); if num_parents >= parallel_threshold() { // Parallel path par_buf.par_chunks_mut(PAR_CHUNK_SIZE).enumerate().for_each(|(ci, chunk)| { let base = start_parent + ci * PAR_CHUNK_SIZE; for (i, slot) in chunk.iter_mut().enumerate() { slot.write(compute_parent(child_level, base + i, level_len, hasher)); } }); // Write results to next level } else { // Sequential fallback }}
Defined in src/tree.rs:960-1003.Key insight: Parent nodes at the same level are independent → perfect parallelism.
Parallel results are collected into a pre-allocated buffer:
let mut par_buf: Vec<Hash> = Vec::new();for level in 0..depth { let work = num_parents - start_parent; if work >= par_threshold { par_buf.clear(); par_buf.reserve(work); let spare = &mut par_buf.spare_capacity_mut()[..work]; spare.par_chunks_mut(PAR_CHUNK_SIZE).enumerate().for_each(|(ci, chunk)| { for (i, slot) in chunk.iter_mut().enumerate() { slot.write(compute_parent(...)); // Initialize MaybeUninit } }); // SAFETY: All elements initialized by the loop above unsafe { par_buf.set_len(work) }; // Write par_buf to tree }}
Defined in src/tree.rs:949-1003.Safety invariant:MaybeUninit::write is called for every element before set_len.
This is unsafe code. The parallel loop MUST initialize every slot, or you get undefined behavior. Rayon’s par_chunks_mut guarantees every chunk is visited.
You can enable both features for multi-threaded insertions + parallel hashing:
[dependencies]rotortree = { version = "0.15", features = ["concurrent", "parallel"] }
Effect:
Multiple threads can call insert_many() concurrently (via RwLock)
Each insert_many() parallelizes its hash computation (via Rayon)
Example:
use rayon::prelude::*;use std::sync::Arc;let tree = Arc::new(LeanIMT::<Blake3Hasher, 4, 32>::new(Blake3Hasher));let batches: Vec<Vec<Hash>> = /* split 10M leaves into 10 batches of 1M */;batches.par_iter().for_each(|batch| { tree.insert_many(batch).unwrap(); // Each thread parallelizes internally});
Throughput: ~50M inserts/sec on a 16-core machine (Blake3).
This is nested parallelism: Rayon (outer) spawns threads for batches, each batch uses Rayon (inner) for hashing. Rayon’s work-stealing handles this gracefully.
let mut par_buf: Vec<Hash> = Vec::new();for level in 0..depth { let work = num_parents - start_parent; par_buf.clear(); par_buf.reserve(work); // Allocate space for `work` hashes // ...}