Overview
Concurrent benchmarks measure the performance ofLeanIMT with the concurrent feature enabled, which switches to &self methods with internal RwLock (via parking_lot) for thread-safe access.
These benchmarks simulate realistic concurrent workloads where multiple threads generate proofs while a writer thread inserts new leaves.
Configuration
Theconcurrent feature changes the tree API:
- Without
concurrent:&mut selfmethods (single-threaded only) - With
concurrent:&selfmethods with internalRwLock<State>
- Multiple concurrent readers (proof generation, snapshots)
- Single writer at a time (insertions)
- Lock-free snapshots (readers never block writers)
Benchmark Implementation
All concurrent benchmarks intree_bench_concurrent.rs use the same underlying tree (LeanIMT) but with shared ownership via Arc:
Insertion Benchmarks
Single Insert (Concurrent)
Inserts leaves one-by-one using theconcurrent API:
- Counts: 1K, 10K, 100K leaves
- Branching factors: N=2, 4, 8, 16
- What it measures: Lock acquisition overhead per insertion
Batch Insert (Concurrent)
Inserts all leaves in a singleinsert_many call:
- Counts: 1K, 10K, 100K, 1M leaves
- Branching factors: N=2, 4, 8, 16
- What it measures: Lock hold time for large batch insertions
Incremental Insert (Concurrent)
Pre-populates tree with N/2 leaves, then inserts remaining N/2:- Counts: 10K, 100K, 1M leaves (total)
- Branching factors: N=2, 4, 8, 16
- What it measures: Lock contention on large existing trees
Proof Operations
Generate Proof (Concurrent)
Generates a proof from a snapshot (lock-free read):- Counts: 1K, 10K, 100K, 1M leaves
- Branching factors: N=2, 4, 8, 16
- What it measures: Snapshot acquisition cost with concurrent access
Verify Proof (Concurrent)
Verifies a proof (no lock required):- Counts: 1K, 10K, 100K, 1M leaves
- Branching factors: N=2, 4, 8, 16
- What it measures: Same as single-threaded (verification is lock-free)
Snapshot (Concurrent)
Measures snapshot acquisition cost:- Counts: 1K, 10K, 100K, 1M leaves
- Branching factors: N=2, 4, 8, 16
- What it measures: Read lock + Arc clone overhead
Concurrent Contention
The most realistic benchmark: 4 reader threads continuously generate proofs while 1 writer thread inserts all leaves.- Counts: 10K, 100K leaves
- Branching factors: N=2, 4, 8, 16
- Reader behavior: Spin-loop generating proofs for middle leaf until tree is full
- Writer behavior: Insert all leaves via
insert_many - What it measures: Real-world concurrent read/write contention
This benchmark simulates a blockchain indexer that:
- Continuously serves proof requests from multiple API workers
- Processes new blocks/leaves from a single sync thread
Lock Contention Analysis
Read Lock (Snapshot)
- Cost: Minimal — parking_lot’s RwLock is fast for uncontended reads
- Contention: Only with concurrent writers (rare)
- Optimization: Snapshots are cheap; cache them if generating multiple proofs
Write Lock (Insert)
- Cost: Higher — must wait for all readers to finish
- Contention: Increases with number of concurrent readers
- Optimization: Use
insert_manyto amortize lock overhead across many leaves
Insert operations block all concurrent readers. For high-concurrency scenarios, batch your writes as much as possible to minimize lock hold time.
Comparison: Concurrent vs Single-Threaded
| Workload | Single-Threaded | Concurrent |
|---|---|---|
| Overhead | None (direct access) | RwLock acquisition |
| Throughput | Higher for single writer | Lower due to locking |
| Variance | Lower (predictable) | Higher (depends on contention) |
| Use Case | Single-threaded apps | Multi-threaded services |
Parallel + Concurrent
You can combine both features:- Concurrent access: Multiple threads can read/write
- Parallel hashing:
insert_manyuses rayon for hash computation
Combining
concurrent + parallel gives the best performance for multi-threaded workloads with large batch insertions. The writer holds the lock while rayon parallelizes the hash computation.Running the Benchmarks
When to Use Concurrent Mode
Useconcurrent when:
- Multiple threads need to generate proofs simultaneously
- A background thread inserts leaves while API threads serve requests
- You’re building a multi-threaded service (e.g., gRPC server with worker pool)
concurrent when:
- Your application is single-threaded
- You can coordinate access at a higher level (e.g., message passing)
- You need absolute maximum throughput (lock overhead matters)
