State Management - Internet Computer

The State Manager is responsible for maintaining, persisting, and synchronizing the replicated state across all nodes in a subnet. It provides critical functionality for state certification, checkpointing, and state synchronization.

Overview

The State Manager component handles:

State Persistence

Writing replicated state to disk as checkpoints

State Certification

Creating certified snapshots for query responses

State Synchronization

Catching up nodes by transferring state

Checkpoint Management

Managing checkpoint lifecycle and storage

The State Manager is implemented in rs/state_manager/ and interfaces with execution, consensus, and networking components.

Architecture

State Manager (rs/state_manager/src/lib.rs)
├── Checkpointing (checkpoint.rs)
│   ├── Create checkpoints from replicated state
│   ├── Load checkpoints from disk
│   └── Manage checkpoint lifecycle
├── State Sync (state_sync.rs)
│   ├── Fetch state from peers
│   ├── Validate received chunks
│   └── Reconstruct complete state
├── Certification (lib.rs)
│   ├── Compute state hash trees
│   ├── Generate certification metadata
│   └── Produce witnesses for queries
├── Manifest (manifest.rs)
│   ├── Content-addressable chunks
│   ├── File metadata and hashing
│   └── Deduplication and compression
└── Tip Management (tip.rs)
    ├── Mutable in-memory state
    ├── Page map management
    └── Background persistence

Replicated State

The replicated state represents the complete state of the subnet:

// From rs/replicated_state/
struct ReplicatedState {
    // Canister states
    canister_states: BTreeMap<CanisterId, CanisterState>,
    
    // System metadata
    system_metadata: SystemMetadata,
    
    // Network topology
    network_topology: NetworkTopology,
    
    // Subnet features and configuration
    subnet_queues: CanisterQueues,
    
    // Bitcoin state (if enabled)
    bitcoin_state: Option<BitcoinState>,
}

Canister State
System Metadata
Network Topology

Each canister maintains:

Wasm module: Compiled code
Wasm memory: Heap state
Stable memory: Persistent storage
Message queues: Input and output queues
System state: Cycles, controllers, etc.
Execution state: Call contexts

Checkpointing

Checkpoints are persistent snapshots of replicated state at specific heights.

Checkpoint Creation

State Preparation

Prepare the state for persistence:

// From rs/state_manager/src/checkpoint.rs
// - Flush page map deltas to disk
// - Strip in-memory overlays
// - Filter canister snapshots

Tip-to-Checkpoint

Convert mutable tip to immutable checkpoint:

// Move files from tip/ to checkpoints/<height>/
// Create hard links for unchanged files
// Write new/modified files

Serialization

Serialize state components:

// Canister metadata → canister.pbuf
// System metadata → system_metadata.pbuf
// Message queues → queues.pbuf
// Wasm state → page files (memory, stable memory)

Verification

Mark checkpoint as complete and verified:

// Compute state root hash
// Write completion marker
// Add to available checkpoints

Checkpoint Interval

Checkpoints are created periodically:

// From rs/state_manager/src/lib.rs
// Default: Every 500 rounds (~40 seconds at 5 rounds/sec)
const NUM_ROUNDS_BEFORE_CHECKPOINT_TO_WRITE_OVERLAY: u64 = 50;

// Checkpoint threads for parallel I/O
pub const NUMBER_OF_CHECKPOINT_THREADS: u32 = 16;

Checkpoint interval balances persistence overhead with state recovery time. More frequent checkpoints mean faster recovery but higher I/O load.

Page Maps

Canister memory is managed using page maps:

Page Map (4KB pages)
├── Base Layer (from checkpoint)
│   └── Immutable page files
└── Delta Layers (overlays)
    ├── Recent modifications
    └── Copy-on-write semantics

Benefits:

Efficient storage: Only store changed pages
Fast checkpointing: Hard-link unchanged pages
Copy-on-write: Preserve checkpoint immutability
Deduplication: Share identical pages across canisters

State Certification

Certification enables clients to verify query responses:

Hash Tree Construction

Build Labeled Tree

Create hierarchical tree from state:

// From rs/state_manager/src/lib.rs
// Labeled tree structure:
// /canister/<id>/certified_data
// /canister/<id>/metadata/<name>
// /time
// /subnet/<id>/metrics

Compute Hash Tree

Hash the tree using Merkle tree:

// Each node: hash = H(label || hash(children))
// Leaves: hash = H(value)
// Root hash represents entire state

Threshold Sign

Subnet collectively signs the root hash:

// Consensus delivers certification
// Threshold BLS signature
// Single signature from subnet

Generate Witnesses

Create Merkle proofs for specific paths:

// Witness includes:
// - Path to queried data
// - Sibling hashes for verification
// - Root hash (signed by subnet)

Certified Queries

Queries return certified responses:

Client → Query Call → Replica
              ↓
        Execute query
              ↓
        Generate witness
              ↓
        Response + Certificate + Witness
              ↓
        Client verifies:
        1. Witness reconstructs root hash
        2. Signature valid on root hash
        3. Data matches witness

// From rs/state_manager/src/lib.rs
// Generate certified response:
// 1. Execute query
// 2. Get witness for queried path
// 3. Attach certificate (threshold signature)
// 4. Return to client

Certification Scope

Different scopes of certification:

Scope	What’s Certified	Use Case
Full	Complete state hash tree	Regular state certification
Metadata	Only certification metadata	During catch-up (optimization)

State Synchronization

State sync enables nodes to catch up by fetching state from peers.

When State Sync Occurs

Node Startup

New or restarted nodes need to catch up

Subnet Catch-Up

Lagging nodes fetch recent state

Subnet Join

New nodes joining existing subnet

State Divergence

Recovery from corrupted state (rare)

State Sync Protocol

Initiate Sync

Determine need for state sync:

// From rs/state_manager/src/state_sync.rs
// Trigger conditions:
// - Missing checkpoint for required height
// - State hash mismatch
// - Consensus requests catch-up

Fetch Manifest

Request state manifest from peers:

// Manifest describes state structure:
// - File table (all files in checkpoint)
// - Chunk table (content-addressed chunks)
// - Root hash for verification

Download Chunks

Fetch chunks in parallel:

// From rs/state_manager/src/state_sync/chunkable/
// - Request chunks from multiple peers
// - Verify chunk hashes
// - Detect and handle corrupted chunks
// - Track progress and remaining chunks

Reassemble State

Reconstruct checkpoint from chunks:

// - Preallocate files
// - Write chunks to correct positions
// - Verify file hashes
// - Hard-link files from existing checkpoints (optimization)

Validate and Load

Verify and activate the synced state:

// - Load checkpoint into memory
// - Verify state root hash
// - Mark checkpoint as verified
// - Resume normal operation

Manifest and Chunks

Manifest Structure:

// From rs/state_manager/src/state_sync/types.rs
struct Manifest {
    // Version for compatibility
    version: u32,
    
    // Files in checkpoint
    file_table: Vec<FileInfo>,
    
    // Content-addressed chunks
    chunk_table: Vec<ChunkInfo>,
}

struct FileInfo {
    relative_path: PathBuf,
    size_bytes: u64,
    hash: [u8; 32],
}

struct ChunkInfo {
    file_index: u32,
    offset: u64,
    size: u32,
    hash: [u8; 32],
}

Chunk Properties:

Fixed size (typically 1 MB)
Content-addressed by hash
Can be deduplicated across files
Fetched independently and in parallel

Chunking enables efficient parallel downloads and deduplication of identical data across the checkpoint.

State Sync Optimizations

Hard-Linking

Reuse files from existing checkpoints:

// If file with same hash exists locally:
// - Create hard link instead of downloading
// - Saves bandwidth and disk space
// - Speeds up state sync significantly

Chunk Deduplication

Avoid downloading duplicate chunks:

// Same chunk appearing in multiple files:
// - Download once
// - Copy to all required positions
// - Common for canister binaries

Parallel Fetching

Download from multiple peers:

// From rs/state_manager/src/state_sync.rs
// - Request different chunks from different nodes
// - Load balance across subnet
// - Maximize bandwidth utilization

Validation Skipping

Skip validation for recent checkpoints:

// From rs/state_manager/src/state_sync.rs
const MAX_HEIGHT_DIFFERENCE_WITHOUT_VALIDATION: u64 = 10_000;

// If syncing to recent state:
// - Skip full validation
// - Reduces sync time
// - Still verify hashes

State Sync Metrics

The State Manager tracks various metrics:

// From rs/state_manager/src/lib.rs
struct StateSyncMetrics {
    size: IntCounterVec,              // Bytes fetched
    duration: HistogramVec,           // Sync time
    remaining: IntGauge,              // Chunks left
    corrupted_chunks: IntCounterVec,  // Invalid chunks
}

Storage Layout

State is organized on disk in a structured layout:

state/
├── checkpoints/
│   ├── 000000000000100/  (height 100)
│   │   ├── system_metadata.pbuf
│   │   ├── subnet_queues.pbuf
│   │   └── canister_states/
│   │       └── 000000000000000001/  (canister ID)
│   │           ├── canister.pbuf
│   │           ├── queues.pbuf
│   │           ├── software.wasm
│   │           └── vmemory_0.bin (page file)
│   └── 000000000000200/  (height 200)
├── tip/
│   └── (mutable in-memory state files)
├── backups/
│   └── (backup checkpoints)
├── diverged_checkpoints/
│   └── (checkpoints that diverged)
└── states_metadata.pbuf
    (manifest cache and metadata)

Checkpoint directories are immutable once created. Never modify files within verified checkpoints.

Tip Management

The “tip” is the current mutable state:

// From rs/state_manager/src/tip.rs
// Tip contains:
// - Latest replicated state
// - Page map overlays (deltas)
// - Pending state changes

// Background thread:
// - Periodically flushes tip to checkpoint
// - Merges page map deltas
// - Manages memory usage

Tip Operations:

State Updates
Page Map Merging
Checkpoint Flushing

Execution modifies tip:

// After each batch execution:
// 1. Update tip with new state
// 2. Record changes in page map overlays
// 3. Advance tip height

Consolidate delta layers:

// Periodically merge overlays:
// - Reduces memory overhead
// - Improves read performance
// - Prepares for checkpointing

Persist tip to disk:

// At checkpoint interval:
// 1. Freeze current tip
// 2. Create new checkpoint
// 3. Create new tip from checkpoint

Performance Considerations

Checkpoint Performance

Parallel I/O

Uses 16 threads for parallel file operations during checkpointing

Hard Links

Reuses unchanged files via hard links, avoiding copies

Incremental Writes

Only writes modified pages through overlays

Background Processing

Checkpoint creation happens asynchronously in background

State Sync Performance

Typical State Sync Times:

Small state (< 1 GB): ~30 seconds
Medium state (10 GB): 2-5 minutes
Large state (100+ GB): 10-30 minutes

Performance Factors:

Network bandwidth between nodes
Disk I/O performance
Amount of existing local state (hard-linking)
Number of peers serving chunks

State sync performance improves significantly if the node has recent checkpoints that share files with the target state.

Best Practices

Monitor Disk Space

Ensure sufficient disk space for checkpoints:

# State Manager automatically manages old checkpoints
# but monitor for:
# - Disk usage trends
# - Large canister states
# - Checkpoint accumulation

Plan for State Sync

Account for state sync times:

- New nodes need time to sync
- Budget for initial sync duration
- Consider subnet size and state size
- Plan node additions accordingly

Optimize Canister State

Keep canister state efficient:

// - Use stable memory for large data
// - Implement garbage collection
// - Avoid unnecessary state growth
// - Compress data when appropriate

Understand Certification

Design for certified queries:

// - Store queryable data in certified_data
// - Keep certification tree shallow
// - Minimize data in certified responses

Source Code Reference

// rs/state_manager/src/lib.rs
// Main StateManagerImpl with:
// - Checkpoint management
// - Certification logic
// - API implementation

Key Directories:

rs/state_manager/src/: Main implementation
rs/state_layout/: Disk layout structures
rs/replicated_state/: State data structures
rs/interfaces/state_manager/: Public interfaces

Canisters

Understand canister state structure

Cryptography

Learn about hash trees and certification

Consensus

How consensus uses certified state

Execution

How execution modifies replicated state

Overview

Getting Started

Architecture

Core Components

IC-OS

Chain Integration

Development

​Overview

State Persistence

State Certification

State Synchronization

Checkpoint Management

​Architecture

​Replicated State

​Checkpointing

​Checkpoint Creation

​Checkpoint Interval

​Page Maps

​State Certification

​Hash Tree Construction

​Certified Queries

​Certification Scope

​State Synchronization

​When State Sync Occurs

Node Startup

Subnet Catch-Up

Subnet Join

State Divergence

​State Sync Protocol

​Manifest and Chunks

​State Sync Optimizations

​State Sync Metrics

​Storage Layout

​Tip Management

​Performance Considerations

​Checkpoint Performance

Parallel I/O

Hard Links

Incremental Writes

Background Processing

​State Sync Performance

​Best Practices

​Source Code Reference

​Related Topics

Canisters

Cryptography

Consensus

Execution

Build docs developers (and LLMs) love

Overview

Architecture

Replicated State

Checkpointing

Checkpoint Creation

Checkpoint Interval

Page Maps

State Certification

Hash Tree Construction

Certified Queries

Certification Scope

State Synchronization

When State Sync Occurs

State Sync Protocol

Manifest and Chunks

State Sync Optimizations

State Sync Metrics

Storage Layout

Tip Management

Performance Considerations

Checkpoint Performance

State Sync Performance

Best Practices

Source Code Reference

Related Topics