Skip to main content
The State Manager is responsible for maintaining, persisting, and synchronizing the replicated state across all nodes in a subnet. It provides critical functionality for state certification, checkpointing, and state synchronization.

Overview

The State Manager component handles:

State Persistence

Writing replicated state to disk as checkpoints

State Certification

Creating certified snapshots for query responses

State Synchronization

Catching up nodes by transferring state

Checkpoint Management

Managing checkpoint lifecycle and storage
The State Manager is implemented in rs/state_manager/ and interfaces with execution, consensus, and networking components.

Architecture

State Manager (rs/state_manager/src/lib.rs)
├── Checkpointing (checkpoint.rs)
│   ├── Create checkpoints from replicated state
│   ├── Load checkpoints from disk
│   └── Manage checkpoint lifecycle
├── State Sync (state_sync.rs)
│   ├── Fetch state from peers
│   ├── Validate received chunks
│   └── Reconstruct complete state
├── Certification (lib.rs)
│   ├── Compute state hash trees
│   ├── Generate certification metadata
│   └── Produce witnesses for queries
├── Manifest (manifest.rs)
│   ├── Content-addressable chunks
│   ├── File metadata and hashing
│   └── Deduplication and compression
└── Tip Management (tip.rs)
    ├── Mutable in-memory state
    ├── Page map management
    └── Background persistence

Replicated State

The replicated state represents the complete state of the subnet:
// From rs/replicated_state/
struct ReplicatedState {
    // Canister states
    canister_states: BTreeMap<CanisterId, CanisterState>,
    
    // System metadata
    system_metadata: SystemMetadata,
    
    // Network topology
    network_topology: NetworkTopology,
    
    // Subnet features and configuration
    subnet_queues: CanisterQueues,
    
    // Bitcoin state (if enabled)
    bitcoin_state: Option<BitcoinState>,
}
Each canister maintains:
  • Wasm module: Compiled code
  • Wasm memory: Heap state
  • Stable memory: Persistent storage
  • Message queues: Input and output queues
  • System state: Cycles, controllers, etc.
  • Execution state: Call contexts

Checkpointing

Checkpoints are persistent snapshots of replicated state at specific heights.

Checkpoint Creation

1

State Preparation

Prepare the state for persistence:
// From rs/state_manager/src/checkpoint.rs
// - Flush page map deltas to disk
// - Strip in-memory overlays
// - Filter canister snapshots
2

Tip-to-Checkpoint

Convert mutable tip to immutable checkpoint:
// Move files from tip/ to checkpoints/<height>/
// Create hard links for unchanged files
// Write new/modified files
3

Serialization

Serialize state components:
// Canister metadata → canister.pbuf
// System metadata → system_metadata.pbuf
// Message queues → queues.pbuf
// Wasm state → page files (memory, stable memory)
4

Verification

Mark checkpoint as complete and verified:
// Compute state root hash
// Write completion marker
// Add to available checkpoints

Checkpoint Interval

Checkpoints are created periodically:
// From rs/state_manager/src/lib.rs
// Default: Every 500 rounds (~40 seconds at 5 rounds/sec)
const NUM_ROUNDS_BEFORE_CHECKPOINT_TO_WRITE_OVERLAY: u64 = 50;

// Checkpoint threads for parallel I/O
pub const NUMBER_OF_CHECKPOINT_THREADS: u32 = 16;
Checkpoint interval balances persistence overhead with state recovery time. More frequent checkpoints mean faster recovery but higher I/O load.

Page Maps

Canister memory is managed using page maps:
Page Map (4KB pages)
├── Base Layer (from checkpoint)
│   └── Immutable page files
└── Delta Layers (overlays)
    ├── Recent modifications
    └── Copy-on-write semantics
Benefits:
  • Efficient storage: Only store changed pages
  • Fast checkpointing: Hard-link unchanged pages
  • Copy-on-write: Preserve checkpoint immutability
  • Deduplication: Share identical pages across canisters

State Certification

Certification enables clients to verify query responses:

Hash Tree Construction

1

Build Labeled Tree

Create hierarchical tree from state:
// From rs/state_manager/src/lib.rs
// Labeled tree structure:
// /canister/<id>/certified_data
// /canister/<id>/metadata/<name>
// /time
// /subnet/<id>/metrics
2

Compute Hash Tree

Hash the tree using Merkle tree:
// Each node: hash = H(label || hash(children))
// Leaves: hash = H(value)
// Root hash represents entire state
3

Threshold Sign

Subnet collectively signs the root hash:
// Consensus delivers certification
// Threshold BLS signature
// Single signature from subnet
4

Generate Witnesses

Create Merkle proofs for specific paths:
// Witness includes:
// - Path to queried data
// - Sibling hashes for verification
// - Root hash (signed by subnet)

Certified Queries

Queries return certified responses:
Client → Query Call → Replica

        Execute query

        Generate witness

        Response + Certificate + Witness

        Client verifies:
        1. Witness reconstructs root hash
        2. Signature valid on root hash
        3. Data matches witness
// From rs/state_manager/src/lib.rs
// Generate certified response:
// 1. Execute query
// 2. Get witness for queried path
// 3. Attach certificate (threshold signature)
// 4. Return to client

Certification Scope

Different scopes of certification:
ScopeWhat’s CertifiedUse Case
FullComplete state hash treeRegular state certification
MetadataOnly certification metadataDuring catch-up (optimization)

State Synchronization

State sync enables nodes to catch up by fetching state from peers.

When State Sync Occurs

Node Startup

New or restarted nodes need to catch up

Subnet Catch-Up

Lagging nodes fetch recent state

Subnet Join

New nodes joining existing subnet

State Divergence

Recovery from corrupted state (rare)

State Sync Protocol

1

Initiate Sync

Determine need for state sync:
// From rs/state_manager/src/state_sync.rs
// Trigger conditions:
// - Missing checkpoint for required height
// - State hash mismatch
// - Consensus requests catch-up
2

Fetch Manifest

Request state manifest from peers:
// Manifest describes state structure:
// - File table (all files in checkpoint)
// - Chunk table (content-addressed chunks)
// - Root hash for verification
3

Download Chunks

Fetch chunks in parallel:
// From rs/state_manager/src/state_sync/chunkable/
// - Request chunks from multiple peers
// - Verify chunk hashes
// - Detect and handle corrupted chunks
// - Track progress and remaining chunks
4

Reassemble State

Reconstruct checkpoint from chunks:
// - Preallocate files
// - Write chunks to correct positions
// - Verify file hashes
// - Hard-link files from existing checkpoints (optimization)
5

Validate and Load

Verify and activate the synced state:
// - Load checkpoint into memory
// - Verify state root hash
// - Mark checkpoint as verified
// - Resume normal operation

Manifest and Chunks

Manifest Structure:
// From rs/state_manager/src/state_sync/types.rs
struct Manifest {
    // Version for compatibility
    version: u32,
    
    // Files in checkpoint
    file_table: Vec<FileInfo>,
    
    // Content-addressed chunks
    chunk_table: Vec<ChunkInfo>,
}

struct FileInfo {
    relative_path: PathBuf,
    size_bytes: u64,
    hash: [u8; 32],
}

struct ChunkInfo {
    file_index: u32,
    offset: u64,
    size: u32,
    hash: [u8; 32],
}
Chunk Properties:
  • Fixed size (typically 1 MB)
  • Content-addressed by hash
  • Can be deduplicated across files
  • Fetched independently and in parallel
Chunking enables efficient parallel downloads and deduplication of identical data across the checkpoint.

State Sync Optimizations

Reuse files from existing checkpoints:
// If file with same hash exists locally:
// - Create hard link instead of downloading
// - Saves bandwidth and disk space
// - Speeds up state sync significantly
Avoid downloading duplicate chunks:
// Same chunk appearing in multiple files:
// - Download once
// - Copy to all required positions
// - Common for canister binaries
Download from multiple peers:
// From rs/state_manager/src/state_sync.rs
// - Request different chunks from different nodes
// - Load balance across subnet
// - Maximize bandwidth utilization
Skip validation for recent checkpoints:
// From rs/state_manager/src/state_sync.rs
const MAX_HEIGHT_DIFFERENCE_WITHOUT_VALIDATION: u64 = 10_000;

// If syncing to recent state:
// - Skip full validation
// - Reduces sync time
// - Still verify hashes

State Sync Metrics

The State Manager tracks various metrics:
// From rs/state_manager/src/lib.rs
struct StateSyncMetrics {
    size: IntCounterVec,              // Bytes fetched
    duration: HistogramVec,           // Sync time
    remaining: IntGauge,              // Chunks left
    corrupted_chunks: IntCounterVec,  // Invalid chunks
}

Storage Layout

State is organized on disk in a structured layout:
state/
├── checkpoints/
│   ├── 000000000000100/  (height 100)
│   │   ├── system_metadata.pbuf
│   │   ├── subnet_queues.pbuf
│   │   └── canister_states/
│   │       └── 000000000000000001/  (canister ID)
│   │           ├── canister.pbuf
│   │           ├── queues.pbuf
│   │           ├── software.wasm
│   │           └── vmemory_0.bin (page file)
│   └── 000000000000200/  (height 200)
├── tip/
│   └── (mutable in-memory state files)
├── backups/
│   └── (backup checkpoints)
├── diverged_checkpoints/
│   └── (checkpoints that diverged)
└── states_metadata.pbuf
    (manifest cache and metadata)
Checkpoint directories are immutable once created. Never modify files within verified checkpoints.

Tip Management

The “tip” is the current mutable state:
// From rs/state_manager/src/tip.rs
// Tip contains:
// - Latest replicated state
// - Page map overlays (deltas)
// - Pending state changes

// Background thread:
// - Periodically flushes tip to checkpoint
// - Merges page map deltas
// - Manages memory usage
Tip Operations:
Execution modifies tip:
// After each batch execution:
// 1. Update tip with new state
// 2. Record changes in page map overlays
// 3. Advance tip height

Performance Considerations

Checkpoint Performance

Parallel I/O

Uses 16 threads for parallel file operations during checkpointing

Hard Links

Reuses unchanged files via hard links, avoiding copies

Incremental Writes

Only writes modified pages through overlays

Background Processing

Checkpoint creation happens asynchronously in background

State Sync Performance

Typical State Sync Times:
  • Small state (< 1 GB): ~30 seconds
  • Medium state (10 GB): 2-5 minutes
  • Large state (100+ GB): 10-30 minutes
Performance Factors:
  • Network bandwidth between nodes
  • Disk I/O performance
  • Amount of existing local state (hard-linking)
  • Number of peers serving chunks
State sync performance improves significantly if the node has recent checkpoints that share files with the target state.

Best Practices

Ensure sufficient disk space for checkpoints:
# State Manager automatically manages old checkpoints
# but monitor for:
# - Disk usage trends
# - Large canister states
# - Checkpoint accumulation
Account for state sync times:
- New nodes need time to sync
- Budget for initial sync duration
- Consider subnet size and state size
- Plan node additions accordingly
Keep canister state efficient:
// - Use stable memory for large data
// - Implement garbage collection
// - Avoid unnecessary state growth
// - Compress data when appropriate
Design for certified queries:
// - Store queryable data in certified_data
// - Keep certification tree shallow
// - Minimize data in certified responses

Source Code Reference

// rs/state_manager/src/lib.rs
// Main StateManagerImpl with:
// - Checkpoint management
// - Certification logic
// - API implementation
Key Directories:
  • rs/state_manager/src/: Main implementation
  • rs/state_layout/: Disk layout structures
  • rs/replicated_state/: State data structures
  • rs/interfaces/state_manager/: Public interfaces

Canisters

Understand canister state structure

Cryptography

Learn about hash trees and certification

Consensus

How consensus uses certified state

Execution

How execution modifies replicated state

Build docs developers (and LLMs) love