The State Manager is responsible for maintaining, persisting, and synchronizing the replicated state across all nodes in a subnet. It provides critical functionality for state certification, checkpointing, and state synchronization.
Overview
The State Manager component handles:
State Persistence Writing replicated state to disk as checkpoints
State Certification Creating certified snapshots for query responses
State Synchronization Catching up nodes by transferring state
Checkpoint Management Managing checkpoint lifecycle and storage
The State Manager is implemented in rs/state_manager/ and interfaces with execution, consensus, and networking components.
Architecture
State Manager (rs/state_manager/src/lib.rs)
├── Checkpointing (checkpoint.rs)
│ ├── Create checkpoints from replicated state
│ ├── Load checkpoints from disk
│ └── Manage checkpoint lifecycle
├── State Sync (state_sync.rs)
│ ├── Fetch state from peers
│ ├── Validate received chunks
│ └── Reconstruct complete state
├── Certification (lib.rs)
│ ├── Compute state hash trees
│ ├── Generate certification metadata
│ └── Produce witnesses for queries
├── Manifest (manifest.rs)
│ ├── Content-addressable chunks
│ ├── File metadata and hashing
│ └── Deduplication and compression
└── Tip Management (tip.rs)
├── Mutable in-memory state
├── Page map management
└── Background persistence
Replicated State
The replicated state represents the complete state of the subnet:
// From rs/replicated_state/
struct ReplicatedState {
// Canister states
canister_states : BTreeMap < CanisterId , CanisterState >,
// System metadata
system_metadata : SystemMetadata ,
// Network topology
network_topology : NetworkTopology ,
// Subnet features and configuration
subnet_queues : CanisterQueues ,
// Bitcoin state (if enabled)
bitcoin_state : Option < BitcoinState >,
}
Canister State
System Metadata
Network Topology
Each canister maintains:
Wasm module : Compiled code
Wasm memory : Heap state
Stable memory : Persistent storage
Message queues : Input and output queues
System state : Cycles, controllers, etc.
Execution state : Call contexts
Subnet-level information:
Batch numbers and timestamps
Generated IDs (canister, message)
Streams to/from other subnets
Ingress history
IC network structure:
Subnet membership
Routing tables
Node public keys
Registry version
Checkpointing
Checkpoints are persistent snapshots of replicated state at specific heights.
Checkpoint Creation
State Preparation
Prepare the state for persistence: // From rs/state_manager/src/checkpoint.rs
// - Flush page map deltas to disk
// - Strip in-memory overlays
// - Filter canister snapshots
Tip-to-Checkpoint
Convert mutable tip to immutable checkpoint: // Move files from tip/ to checkpoints/<height>/
// Create hard links for unchanged files
// Write new/modified files
Serialization
Serialize state components: // Canister metadata → canister.pbuf
// System metadata → system_metadata.pbuf
// Message queues → queues.pbuf
// Wasm state → page files (memory, stable memory)
Verification
Mark checkpoint as complete and verified: // Compute state root hash
// Write completion marker
// Add to available checkpoints
Checkpoint Interval
Checkpoints are created periodically:
// From rs/state_manager/src/lib.rs
// Default: Every 500 rounds (~40 seconds at 5 rounds/sec)
const NUM_ROUNDS_BEFORE_CHECKPOINT_TO_WRITE_OVERLAY : u64 = 50 ;
// Checkpoint threads for parallel I/O
pub const NUMBER_OF_CHECKPOINT_THREADS : u32 = 16 ;
Checkpoint interval balances persistence overhead with state recovery time. More frequent checkpoints mean faster recovery but higher I/O load.
Page Maps
Canister memory is managed using page maps:
Page Map (4KB pages)
├── Base Layer (from checkpoint)
│ └── Immutable page files
└── Delta Layers (overlays)
├── Recent modifications
└── Copy-on-write semantics
Benefits:
Efficient storage : Only store changed pages
Fast checkpointing : Hard-link unchanged pages
Copy-on-write : Preserve checkpoint immutability
Deduplication : Share identical pages across canisters
State Certification
Certification enables clients to verify query responses:
Hash Tree Construction
Build Labeled Tree
Create hierarchical tree from state: // From rs/state_manager/src/lib.rs
// Labeled tree structure:
// /canister/<id>/certified_data
// /canister/<id>/metadata/<name>
// /time
// /subnet/<id>/metrics
Compute Hash Tree
Hash the tree using Merkle tree: // Each node: hash = H(label || hash(children))
// Leaves: hash = H(value)
// Root hash represents entire state
Threshold Sign
Subnet collectively signs the root hash: // Consensus delivers certification
// Threshold BLS signature
// Single signature from subnet
Generate Witnesses
Create Merkle proofs for specific paths: // Witness includes:
// - Path to queried data
// - Sibling hashes for verification
// - Root hash (signed by subnet)
Certified Queries
Queries return certified responses:
Client → Query Call → Replica
↓
Execute query
↓
Generate witness
↓
Response + Certificate + Witness
↓
Client verifies:
1. Witness reconstructs root hash
2. Signature valid on root hash
3. Data matches witness
Server Side (Replica)
Client Side
// From rs/state_manager/src/lib.rs
// Generate certified response:
// 1. Execute query
// 2. Get witness for queried path
// 3. Attach certificate (threshold signature)
// 4. Return to client
Certification Scope
Different scopes of certification:
Scope What’s Certified Use Case Full Complete state hash tree Regular state certification Metadata Only certification metadata During catch-up (optimization)
State Synchronization
State sync enables nodes to catch up by fetching state from peers.
When State Sync Occurs
Node Startup New or restarted nodes need to catch up
Subnet Catch-Up Lagging nodes fetch recent state
Subnet Join New nodes joining existing subnet
State Divergence Recovery from corrupted state (rare)
State Sync Protocol
Initiate Sync
Determine need for state sync: // From rs/state_manager/src/state_sync.rs
// Trigger conditions:
// - Missing checkpoint for required height
// - State hash mismatch
// - Consensus requests catch-up
Fetch Manifest
Request state manifest from peers: // Manifest describes state structure:
// - File table (all files in checkpoint)
// - Chunk table (content-addressed chunks)
// - Root hash for verification
Download Chunks
Fetch chunks in parallel: // From rs/state_manager/src/state_sync/chunkable/
// - Request chunks from multiple peers
// - Verify chunk hashes
// - Detect and handle corrupted chunks
// - Track progress and remaining chunks
Reassemble State
Reconstruct checkpoint from chunks: // - Preallocate files
// - Write chunks to correct positions
// - Verify file hashes
// - Hard-link files from existing checkpoints (optimization)
Validate and Load
Verify and activate the synced state: // - Load checkpoint into memory
// - Verify state root hash
// - Mark checkpoint as verified
// - Resume normal operation
Manifest and Chunks
Manifest Structure:
// From rs/state_manager/src/state_sync/types.rs
struct Manifest {
// Version for compatibility
version : u32 ,
// Files in checkpoint
file_table : Vec < FileInfo >,
// Content-addressed chunks
chunk_table : Vec < ChunkInfo >,
}
struct FileInfo {
relative_path : PathBuf ,
size_bytes : u64 ,
hash : [ u8 ; 32 ],
}
struct ChunkInfo {
file_index : u32 ,
offset : u64 ,
size : u32 ,
hash : [ u8 ; 32 ],
}
Chunk Properties:
Fixed size (typically 1 MB)
Content-addressed by hash
Can be deduplicated across files
Fetched independently and in parallel
Chunking enables efficient parallel downloads and deduplication of identical data across the checkpoint.
State Sync Optimizations
Reuse files from existing checkpoints: // If file with same hash exists locally:
// - Create hard link instead of downloading
// - Saves bandwidth and disk space
// - Speeds up state sync significantly
Avoid downloading duplicate chunks: // Same chunk appearing in multiple files:
// - Download once
// - Copy to all required positions
// - Common for canister binaries
Download from multiple peers: // From rs/state_manager/src/state_sync.rs
// - Request different chunks from different nodes
// - Load balance across subnet
// - Maximize bandwidth utilization
Skip validation for recent checkpoints: // From rs/state_manager/src/state_sync.rs
const MAX_HEIGHT_DIFFERENCE_WITHOUT_VALIDATION : u64 = 10_000 ;
// If syncing to recent state:
// - Skip full validation
// - Reduces sync time
// - Still verify hashes
State Sync Metrics
The State Manager tracks various metrics:
// From rs/state_manager/src/lib.rs
struct StateSyncMetrics {
size : IntCounterVec , // Bytes fetched
duration : HistogramVec , // Sync time
remaining : IntGauge , // Chunks left
corrupted_chunks : IntCounterVec , // Invalid chunks
}
Storage Layout
State is organized on disk in a structured layout:
state/
├── checkpoints/
│ ├── 000000000000100/ (height 100)
│ │ ├── system_metadata.pbuf
│ │ ├── subnet_queues.pbuf
│ │ └── canister_states/
│ │ └── 000000000000000001/ (canister ID)
│ │ ├── canister.pbuf
│ │ ├── queues.pbuf
│ │ ├── software.wasm
│ │ └── vmemory_0.bin (page file)
│ └── 000000000000200/ (height 200)
├── tip/
│ └── (mutable in-memory state files)
├── backups/
│ └── (backup checkpoints)
├── diverged_checkpoints/
│ └── (checkpoints that diverged)
└── states_metadata.pbuf
(manifest cache and metadata)
Checkpoint directories are immutable once created. Never modify files within verified checkpoints.
Tip Management
The “tip” is the current mutable state:
// From rs/state_manager/src/tip.rs
// Tip contains:
// - Latest replicated state
// - Page map overlays (deltas)
// - Pending state changes
// Background thread:
// - Periodically flushes tip to checkpoint
// - Merges page map deltas
// - Manages memory usage
Tip Operations:
State Updates
Page Map Merging
Checkpoint Flushing
Execution modifies tip: // After each batch execution:
// 1. Update tip with new state
// 2. Record changes in page map overlays
// 3. Advance tip height
Consolidate delta layers: // Periodically merge overlays:
// - Reduces memory overhead
// - Improves read performance
// - Prepares for checkpointing
Persist tip to disk: // At checkpoint interval:
// 1. Freeze current tip
// 2. Create new checkpoint
// 3. Create new tip from checkpoint
Parallel I/O Uses 16 threads for parallel file operations during checkpointing
Hard Links Reuses unchanged files via hard links, avoiding copies
Incremental Writes Only writes modified pages through overlays
Background Processing Checkpoint creation happens asynchronously in background
Typical State Sync Times:
Small state (< 1 GB): ~30 seconds
Medium state (10 GB): 2-5 minutes
Large state (100+ GB): 10-30 minutes
Performance Factors:
Network bandwidth between nodes
Disk I/O performance
Amount of existing local state (hard-linking)
Number of peers serving chunks
State sync performance improves significantly if the node has recent checkpoints that share files with the target state.
Best Practices
Ensure sufficient disk space for checkpoints: # State Manager automatically manages old checkpoints
# but monitor for:
# - Disk usage trends
# - Large canister states
# - Checkpoint accumulation
Account for state sync times: - New nodes need time to sync
- Budget for initial sync duration
- Consider subnet size and state size
- Plan node additions accordingly
Keep canister state efficient: // - Use stable memory for large data
// - Implement garbage collection
// - Avoid unnecessary state growth
// - Compress data when appropriate
Design for certified queries: // - Store queryable data in certified_data
// - Keep certification tree shallow
// - Minimize data in certified responses
Source Code Reference
State Manager Core
Checkpointing
State Sync
Manifests
// rs/state_manager/src/lib.rs
// Main StateManagerImpl with:
// - Checkpoint management
// - Certification logic
// - API implementation
Key Directories:
rs/state_manager/src/: Main implementation
rs/state_layout/: Disk layout structures
rs/replicated_state/: State data structures
rs/interfaces/state_manager/: Public interfaces
Canisters Understand canister state structure
Cryptography Learn about hash trees and certification
Consensus How consensus uses certified state
Execution How execution modifies replicated state