Skip to main content
Indexing is the core process by which Graph Node extracts, transforms, and stores blockchain data according to subgraph definitions. This page explores the indexing pipeline from block ingestion to entity storage.

Indexing Overview

Graph Node continuously monitors blockchains and processes relevant data as new blocks are produced. The indexing process is:
  • Real-time: New blocks are processed as they’re produced
  • Deterministic: Same inputs always produce same outputs
  • Resumable: Can stop and resume from any block
  • Reorg-safe: Handles chain reorganizations automatically
Graph Node maintains block-level granularity for all entities, enabling time-travel queries and seamless chain reorganization handling.

Indexing Pipeline

The indexing pipeline consists of several stages that transform raw blockchain data into queryable entities:
┌─────────────────┐
│ Block Ingestion │
└────────┬────────┘


┌─────────────────┐
│  Block Stream   │
└────────┬────────┘


┌─────────────────┐
│Trigger Matching │
└────────┬────────┘


┌─────────────────┐
│WASM Handler     │
│   Execution     │
└────────┬────────┘


┌─────────────────┐
│Entity Persistence│
└────────┬────────┘


┌─────────────────┐
│ Cursor Update   │
└─────────────────┘

1. Block Ingestion

Block ingestion is the process of fetching new blocks from blockchain nodes and storing them in Graph Node’s block cache.

Ingestion Modes

Firehose

High-performance streaming protocol optimized for indexing. Recommended for production.

RPC Polling

Direct RPC calls to blockchain nodes. Simpler but less efficient.

Firehose Ingestion

Firehose is a gRPC-based streaming protocol that provides:
  • Linear streaming: Blocks arrive in order without gaps
  • Fork awareness: Proper handling of chain reorganizations
  • Efficient format: Protobuf encoding for minimal bandwidth
  • Cursor-based resumption: Resume from exact position after restart
// Firehose block ingestion
pub trait BlockIngestor: 'static + Send + Sync {
    async fn run(self: Box<Self>);
    fn network_name(&self) -> ChainName;
    fn kind(&self) -> BlockchainKind;
}
Location in codebase: graph/src/blockchain/firehose_block_ingestor.rs

Block Storage

Ingested blocks are stored in the ChainStore for later retrieval:
  • Block data: Full block information (transactions, logs, etc.)
  • Block metadata: Hash, number, parent hash, timestamp
  • Chain head tracking: Latest block for each chain
Location in codebase: store/postgres/src/chain_store.rs

2. Block Streaming

The block stream converts ingested blocks into a stream of blocks relevant to specific subgraphs.

Block Stream Creation

impl<C: Blockchain> Blockchain for C {
    async fn new_block_stream(
        &self,
        deployment: DeploymentLocator,
        store: impl DeploymentCursorTracker,
        start_blocks: Vec<BlockNumber>,
        source_subgraph_stores: Vec<Arc<dyn SourceableStore>>,
        filter: Arc<TriggerFilterWrapper<Self>>,
        unified_api_version: UnifiedMappingApiVersion,
    ) -> Result<Box<dyn BlockStream<Self>>, Error>;
}
Key parameters:
  • start_blocks: List of starting points for different data sources
  • filter: Trigger filter specifying what events/calls to extract
  • deployment: Identifies which subgraph is indexing
Location in codebase: graph/src/blockchain/block_stream.rs

Buffered Block Streaming

Block streams are buffered for performance:
pub const BUFFERED_BLOCK_STREAM_SIZE: usize = 100;
pub const FIREHOSE_BUFFER_STREAM_SIZE: usize = 1;
Buffering allows:
  • Parallel processing: Multiple blocks can be processed concurrently
  • Backpressure handling: Slower consumers don’t overwhelm faster producers
  • Smooth performance: Evens out processing time variations
Location in codebase: graph/src/blockchain/block_stream.rs:22-23

Block Stream Events

Block streams emit events as they process the chain:
pub enum BlockStreamEvent<C: Blockchain> {
    // New block with triggers to process
    ProcessBlock(BlockWithTriggers<C>, FirehoseCursor),
    
    // Chain reorganization detected
    Revert(BlockPtr, FirehoseCursor),
}

3. Trigger Extraction and Matching

Triggers are blockchain events that activate subgraph handlers. The triggers adapter extracts relevant triggers from blocks.

Trigger Types

For Ethereum:
  • Event triggers: Smart contract event logs
  • Call triggers: Function calls to contracts (requires archive/trace)
  • Block triggers: Block data itself (filtered or all blocks)
For other chains, trigger types vary based on blockchain capabilities.

Trigger Adapter

pub trait TriggersAdapter<C: Blockchain>: Send + Sync {
    async fn triggers_in_block(
        &self,
        logger: &Logger,
        block: C::Block,
    ) -> Result<BlockWithTriggers<C>, Error>;
    
    fn is_on_main_chain(&self, ptr: BlockPtr) -> bool;
}
Implementation: Chain-specific adapters implement this trait to extract triggers from their block format. Location in codebase: graph/src/blockchain/block_stream.rs

Trigger Filtering

Trigger filters specify what to extract from blocks:
pub trait TriggerFilter<C: Blockchain>: Default + Clone + Send + Sync {
    fn from_data_sources<'a>(
        data_sources: impl Iterator<Item = &'a C::DataSource> + Clone,
    ) -> Self;
    
    fn extend<'a>(&mut self, data_sources: impl Iterator<Item = &'a C::DataSource> + Clone);
    
    fn to_firehose_filter(self) -> Vec<prost_types::Any>;
}
For Ethereum: Filters include contract addresses and event signatures Location in codebase: graph/src/blockchain/mod.rs:282-298

Trigger Matching Process

  1. Filter application: Block stream applies filter to extract candidate triggers
  2. Data source matching: Each trigger is matched against subgraph data sources
  3. Handler identification: Matching determines which handler to invoke
  4. Decoding: Trigger data is decoded into handler-specific format
impl<C: Blockchain> DataSource<C> {
    fn match_and_decode(
        &self,
        trigger: &C::TriggerData,
        block: &Arc<C::Block>,
        logger: &Logger,
    ) -> Result<Option<TriggerWithHandler<C>>, Error>;
}
Performance note: This is hot code path - called for every trigger against every data source. Efficient implementation is critical. Location in codebase: graph/src/blockchain/mod.rs:340-345

4. Runtime Execution

Once triggers are matched to handlers, the WASM runtime executes the mapping code.

Runtime Host

The RuntimeHost manages WASM module lifecycle:
pub struct RuntimeHostBuilder<C: Blockchain> {
    // WASM module from subgraph deployment
    module: Arc<ValidModule>,
    // Data source this runtime handles
    data_source: C::DataSource,
    // Host functions available to WASM
    host_fns: Vec<HostFn>,
    // Metrics and logging
    host_metrics: Arc<HostMetrics>,
}
Location in codebase: runtime/wasm/src/host.rs

Handler Invocation

For each trigger:
  1. Module instantiation: Create fresh WASM instance (or reuse pooled instance)
  2. Context setup: Prepare block, transaction, and event data
  3. Handler call: Invoke exported handler function
  4. Host function execution: Process calls to entity.save(), ethereum.call(), etc.
  5. Gas accounting: Track and limit computation (gas metering)

Gas Metering

Gas metering prevents infinite loops and resource exhaustion:
pub struct GasCounter {
    gas_used: AtomicU64,
    gas_limit: Option<u64>,
}
Each WASM instruction consumes gas. If gas limit is exceeded, execution halts. Configuration: Set via GRAPH_MAX_GAS_PER_HANDLER environment variable Location in codebase: graph/src/runtime/gas.rs

Host Exports

Host exports are functions that WASM code can call:
  • entity.save(): Persist entity to store
  • Entity.load(id): Load entity from store
  • store.remove(entity, id): Delete entity
Implementation: runtime/wasm/src/host_exports.rs
  • ethereum.call(): Make eth_call to contract
  • Access to event.params, block.timestamp, transaction.hash, etc.
Implementation: chain/ethereum/src/runtime.rs
  • ipfs.cat(hash): Fetch file from IPFS
  • ipfs.map(hash, callback, flags): Process IPFS file with callback
Use case: Index metadata stored on IPFS (NFT metadata, etc.)
  • crypto.keccak256(input): Keccak-256 hash
  • crypto.sha256(input): SHA-256 hash
Use case: Generate deterministic IDs, verify signatures

Dynamic Data Sources

Handlers can create new data sources at runtime:
import { PairCreated } from "../generated/Factory/Factory"
import { Pair as PairTemplate } from "../generated/templates"

export function handlePairCreated(event: PairCreated): void {
  // Create dynamic data source for new pair
  PairTemplate.create(event.params.pair)
  
  // Store pair entity
  let pair = new Pair(event.params.pair.toHex())
  pair.token0 = event.params.token0
  pair.token1 = event.params.token1
  pair.save()
}
Internal flow:
  1. DataSourceTemplate.create(address) calls host function
  2. New data source is instantiated from template
  3. Data source is added to active data sources
  4. Trigger filter is updated
  5. Block may be refetched if is_refetch_block_required() returns true
Location in codebase: core/src/subgraph/runner.rs handles dynamic data source lifecycle

5. Entity Persistence

Entities modified by handlers are persisted to PostgreSQL with block-level granularity.

Entity Storage Model

Each entity is stored with:
  • id: Primary key (user-defined)
  • vid: Version ID (auto-incrementing, unique per version)
  • block_range: Range of blocks where this version is valid
  • causality_region: For parallel processing safety
  • …fields: Entity-specific fields from schema
CREATE TABLE sgd1.user (
    vid BIGSERIAL PRIMARY KEY,
    id VARCHAR(255) NOT NULL,
    address BYTEA NOT NULL,
    balance NUMERIC NOT NULL,
    block_range int4range NOT NULL,
    EXCLUDE USING gist (id WITH =, block_range WITH &&)
);
Key insight: The block_range exclusion constraint ensures only one version exists for any block. Location in codebase: store/postgres/src/relational.rs generates schema

Write Operations

When mapping code calls entity.save():
  1. Buffering: Entity changes are buffered in memory
  2. Validation: Entity validates against schema
  3. Conflict detection: Check for concurrent modifications
  4. SQL generation: Generate INSERT or UPDATE statement
  5. Transaction: Changes are committed in block-level transaction
pub trait WritableStore: Send + Sync {
    fn transact_block_operations(
        &self,
        block_ptr_to: BlockPtr,
        mods: Vec<EntityModification>,
        stopwatch: StopwatchMetrics,
        deterministic_errors: Vec<SubgraphError>,
    ) -> Result<StoreEvent, StoreError>;
}
Location in codebase: store/postgres/src/writable.rs

Batch Processing

Entities are written in batches for efficiency:
pub const WRITE_BATCH_SIZE: usize = 1000;
Batch processing:
  • Reduces database round trips
  • Improves transaction throughput
  • Enables bulk optimizations in PostgreSQL

Block Range Updates

When an entity is updated:
  1. Close previous version: Set upper bound of previous block_range
  2. Insert new version: Create new row with updated data and new block_range
This maintains complete history for time-travel queries. Example:
-- Original entity (blocks 100-110)
id='user1', balance=1000, block_range=[100, 110)

-- Update at block 110
id='user1', balance=1000, block_range=[100, 110)  -- closed
id='user1', balance=1500, block_range=[110, ∞)    -- new version

6. Cursor and Progress Tracking

Graph Node tracks indexing progress via cursors stored in the database.

Subgraph Cursor

The cursor represents the latest fully-processed block:
pub struct BlockPtr {
    pub hash: BlockHash,
    pub number: BlockNumber,
}
Cursor is updated after:
  • All triggers in block are processed
  • All entities are persisted
  • Transaction is committed

Deployment Status

Subgraph deployments track multiple status fields:
CREATE TABLE subgraphs.subgraph_deployment (
    id INTEGER PRIMARY KEY,
    deployment VARCHAR NOT NULL,
    latest_ethereum_block_number NUMERIC,
    latest_ethereum_block_hash BYTEA,
    synced BOOLEAN NOT NULL DEFAULT FALSE,
    failed BOOLEAN NOT NULL DEFAULT FALSE,
    fatal_error TEXT,
    -- ... more fields
);
Status values:
  • synced: Subgraph has caught up to chain head
  • failed: Subgraph encountered non-deterministic error
  • fatal_error: Error message if failed
Location in codebase: store/postgres/src/deployment_store.rs

Chain Reorganization Handling

Blockchain reorganizations (reorgs) occur when the canonical chain changes. Graph Node handles reorgs automatically.

Reorg Detection

Block streams detect reorgs by comparing:
  • Expected parent hash (from previous block)
  • Actual parent hash (from new block)
Mismatch indicates a reorg occurred.

Reorg Processing

  1. Revert event: Block stream emits BlockStreamEvent::Revert
  2. Entity rollback: Store reverts entity changes from reverted blocks
  3. Cursor update: Cursor moves back to reorg point
  4. Reprocessing: Blocks on new canonical chain are processed
pub enum BlockStreamEvent<C: Blockchain> {
    ProcessBlock(BlockWithTriggers<C>, FirehoseCursor),
    Revert(BlockPtr, FirehoseCursor),  // Revert to this block
}
Location in codebase: core/src/subgraph/runner.rs handles revert events

Entity Reversion

Entities are reverted using block ranges:
-- Delete entity versions created after reorg point
DELETE FROM sgd1.user WHERE lower(block_range) > $reorg_block;

-- Reopen entity versions that were closed after reorg point
UPDATE sgd1.user 
SET block_range = int4range(lower(block_range), NULL)
WHERE upper(block_range) > $reorg_block AND upper(block_range) IS NOT NULL;
Determinism guarantee: Because mappings are deterministic, reprocessing blocks produces the same entities as the first time.

Performance Optimizations

Parallel Block Processing

Graph Node can process multiple blocks in parallel when safe:
  • Independent blocks: Blocks without entity dependencies can be processed concurrently
  • Causality regions: Ensure conflicting updates are serialized
  • Configurable parallelism: Set via GRAPH_ETHEREUM_PARALLEL_BLOCK_RANGES

Declared Calls Optimization

Declared calls execute in parallel before handler invocation:
eventHandlers:
  - event: Swap(...)
    handler: handleSwap
    calls:
      reserves: Pair[event.address].getReserves()
      totalSupply: Pair[event.address].totalSupply()
Benefit: Multiple eth_call operations execute concurrently instead of sequentially. Available from: specVersion 1.2.0

Block Cache

Recently processed blocks are cached in memory:
  • Avoids redundant fetches
  • Speeds up reorg handling
  • Reduces blockchain node load
Configuration: GRAPH_ETHEREUM_BLOCK_CACHE_SIZE

Monitoring Indexing Progress

Metrics

Graph Node exposes Prometheus metrics at /metrics:
  • deployment_head: Current block number for each subgraph
  • ethereum_chain_head_number: Latest block on chain
  • deployment_sync_duration: Time to process each block
  • deployment_trigger_processing_duration: Handler execution time
Reference: docs/metrics.md

GraphQL Query

Query indexing status via meta field:
{
  _meta {
    block {
      number
      hash
      timestamp
    }
    deployment
    hasIndexingErrors
  }
}

Graphman CLI

Check subgraph status:
# Show deployment info
graphman info <deployment-id>

# Show indexing progress
graphman stats show <deployment-id>
Reference: docs/graphman.md

Error Handling

Deterministic Errors

Errors that always occur at the same block (e.g., divide by zero in mapping):
  • Behavior: Subgraph marks block as failed but continues
  • Feature flag: Requires nonFatalErrors in manifest features
  • Query impact: Queries succeed but may have incomplete data

Non-Deterministic Errors

Errors that may not reoccur (e.g., network timeout, database connection loss):
  • Behavior: Subgraph retries with exponential backoff
  • Failure threshold: After N retries, marks deployment as failed
  • Recovery: Requires manual restart or redeployment
Location in codebase: core/src/subgraph/runner.rs implements retry logic

Next Steps

Query Execution

Learn how indexed data is queried via GraphQL

Architecture

Understand Graph Node’s component architecture

Configuration

Optimize indexing performance with configuration

Monitoring

Set up metrics and monitoring for your node

Build docs developers (and LLMs) love