Indexing - Graph Node

Indexing is the core process by which Graph Node extracts, transforms, and stores blockchain data according to subgraph definitions. This page explores the indexing pipeline from block ingestion to entity storage.

Indexing Overview

Graph Node continuously monitors blockchains and processes relevant data as new blocks are produced. The indexing process is:

Real-time: New blocks are processed as they’re produced
Deterministic: Same inputs always produce same outputs
Resumable: Can stop and resume from any block
Reorg-safe: Handles chain reorganizations automatically

Graph Node maintains block-level granularity for all entities, enabling time-travel queries and seamless chain reorganization handling.

Indexing Pipeline

The indexing pipeline consists of several stages that transform raw blockchain data into queryable entities:

┌─────────────────┐
│ Block Ingestion │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Block Stream   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│Trigger Matching │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│WASM Handler     │
│   Execution     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│Entity Persistence│
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Cursor Update   │
└─────────────────┘

1. Block Ingestion

Block ingestion is the process of fetching new blocks from blockchain nodes and storing them in Graph Node’s block cache.

Ingestion Modes

Firehose

High-performance streaming protocol optimized for indexing. Recommended for production.

RPC Polling

Direct RPC calls to blockchain nodes. Simpler but less efficient.

Firehose Ingestion

Firehose is a gRPC-based streaming protocol that provides:

Linear streaming: Blocks arrive in order without gaps
Fork awareness: Proper handling of chain reorganizations
Efficient format: Protobuf encoding for minimal bandwidth
Cursor-based resumption: Resume from exact position after restart

// Firehose block ingestion
pub trait BlockIngestor: 'static + Send + Sync {
    async fn run(self: Box<Self>);
    fn network_name(&self) -> ChainName;
    fn kind(&self) -> BlockchainKind;
}

Location in codebase: graph/src/blockchain/firehose_block_ingestor.rs

Block Storage

Ingested blocks are stored in the ChainStore for later retrieval:

Block data: Full block information (transactions, logs, etc.)
Block metadata: Hash, number, parent hash, timestamp
Chain head tracking: Latest block for each chain

Location in codebase: store/postgres/src/chain_store.rs

2. Block Streaming

The block stream converts ingested blocks into a stream of blocks relevant to specific subgraphs.

Block Stream Creation

impl<C: Blockchain> Blockchain for C {
    async fn new_block_stream(
        &self,
        deployment: DeploymentLocator,
        store: impl DeploymentCursorTracker,
        start_blocks: Vec<BlockNumber>,
        source_subgraph_stores: Vec<Arc<dyn SourceableStore>>,
        filter: Arc<TriggerFilterWrapper<Self>>,
        unified_api_version: UnifiedMappingApiVersion,
    ) -> Result<Box<dyn BlockStream<Self>>, Error>;
}

Key parameters:

start_blocks: List of starting points for different data sources
filter: Trigger filter specifying what events/calls to extract
deployment: Identifies which subgraph is indexing

Location in codebase: graph/src/blockchain/block_stream.rs

Buffered Block Streaming

Block streams are buffered for performance:

pub const BUFFERED_BLOCK_STREAM_SIZE: usize = 100;
pub const FIREHOSE_BUFFER_STREAM_SIZE: usize = 1;

Buffering allows:

Parallel processing: Multiple blocks can be processed concurrently
Backpressure handling: Slower consumers don’t overwhelm faster producers
Smooth performance: Evens out processing time variations

Location in codebase: graph/src/blockchain/block_stream.rs:22-23

Block Stream Events

Block streams emit events as they process the chain:

pub enum BlockStreamEvent<C: Blockchain> {
    // New block with triggers to process
    ProcessBlock(BlockWithTriggers<C>, FirehoseCursor),
    
    // Chain reorganization detected
    Revert(BlockPtr, FirehoseCursor),
}

3. Trigger Extraction and Matching

Triggers are blockchain events that activate subgraph handlers. The triggers adapter extracts relevant triggers from blocks.

Trigger Types

For Ethereum:

Event triggers: Smart contract event logs
Call triggers: Function calls to contracts (requires archive/trace)
Block triggers: Block data itself (filtered or all blocks)

For other chains, trigger types vary based on blockchain capabilities.

Trigger Adapter

pub trait TriggersAdapter<C: Blockchain>: Send + Sync {
    async fn triggers_in_block(
        &self,
        logger: &Logger,
        block: C::Block,
    ) -> Result<BlockWithTriggers<C>, Error>;
    
    fn is_on_main_chain(&self, ptr: BlockPtr) -> bool;
}

Implementation: Chain-specific adapters implement this trait to extract triggers from their block format. Location in codebase: graph/src/blockchain/block_stream.rs

Trigger Filtering

Trigger filters specify what to extract from blocks:

pub trait TriggerFilter<C: Blockchain>: Default + Clone + Send + Sync {
    fn from_data_sources<'a>(
        data_sources: impl Iterator<Item = &'a C::DataSource> + Clone,
    ) -> Self;
    
    fn extend<'a>(&mut self, data_sources: impl Iterator<Item = &'a C::DataSource> + Clone);
    
    fn to_firehose_filter(self) -> Vec<prost_types::Any>;
}

For Ethereum: Filters include contract addresses and event signatures Location in codebase: graph/src/blockchain/mod.rs:282-298

Trigger Matching Process

Filter application: Block stream applies filter to extract candidate triggers
Data source matching: Each trigger is matched against subgraph data sources
Handler identification: Matching determines which handler to invoke
Decoding: Trigger data is decoded into handler-specific format

impl<C: Blockchain> DataSource<C> {
    fn match_and_decode(
        &self,
        trigger: &C::TriggerData,
        block: &Arc<C::Block>,
        logger: &Logger,
    ) -> Result<Option<TriggerWithHandler<C>>, Error>;
}

Performance note: This is hot code path - called for every trigger against every data source. Efficient implementation is critical. Location in codebase: graph/src/blockchain/mod.rs:340-345

4. Runtime Execution

Once triggers are matched to handlers, the WASM runtime executes the mapping code.

Runtime Host

The RuntimeHost manages WASM module lifecycle:

pub struct RuntimeHostBuilder<C: Blockchain> {
    // WASM module from subgraph deployment
    module: Arc<ValidModule>,
    // Data source this runtime handles
    data_source: C::DataSource,
    // Host functions available to WASM
    host_fns: Vec<HostFn>,
    // Metrics and logging
    host_metrics: Arc<HostMetrics>,
}

Location in codebase: runtime/wasm/src/host.rs

Handler Invocation

For each trigger:

Module instantiation: Create fresh WASM instance (or reuse pooled instance)
Context setup: Prepare block, transaction, and event data
Handler call: Invoke exported handler function
Host function execution: Process calls to entity.save(), ethereum.call(), etc.
Gas accounting: Track and limit computation (gas metering)

Gas Metering

Gas metering prevents infinite loops and resource exhaustion:

pub struct GasCounter {
    gas_used: AtomicU64,
    gas_limit: Option<u64>,
}

Each WASM instruction consumes gas. If gas limit is exceeded, execution halts. Configuration: Set via GRAPH_MAX_GAS_PER_HANDLER environment variable Location in codebase: graph/src/runtime/gas.rs

Host Exports

Host exports are functions that WASM code can call:

Entity Operations

entity.save(): Persist entity to store
Entity.load(id): Load entity from store
store.remove(entity, id): Delete entity

Implementation: runtime/wasm/src/host_exports.rs

Ethereum Operations

ethereum.call(): Make eth_call to contract
Access to event.params, block.timestamp, transaction.hash, etc.

Implementation: chain/ethereum/src/runtime.rs

IPFS Operations

ipfs.cat(hash): Fetch file from IPFS
ipfs.map(hash, callback, flags): Process IPFS file with callback

Use case: Index metadata stored on IPFS (NFT metadata, etc.)

Crypto Operations

crypto.keccak256(input): Keccak-256 hash
crypto.sha256(input): SHA-256 hash

Use case: Generate deterministic IDs, verify signatures

Dynamic Data Sources

Handlers can create new data sources at runtime:

import { PairCreated } from "../generated/Factory/Factory"
import { Pair as PairTemplate } from "../generated/templates"

export function handlePairCreated(event: PairCreated): void {
  // Create dynamic data source for new pair
  PairTemplate.create(event.params.pair)
  
  // Store pair entity
  let pair = new Pair(event.params.pair.toHex())
  pair.token0 = event.params.token0
  pair.token1 = event.params.token1
  pair.save()
}

Internal flow:

DataSourceTemplate.create(address) calls host function
New data source is instantiated from template
Data source is added to active data sources
Trigger filter is updated
Block may be refetched if is_refetch_block_required() returns true

Location in codebase: core/src/subgraph/runner.rs handles dynamic data source lifecycle

5. Entity Persistence

Entities modified by handlers are persisted to PostgreSQL with block-level granularity.

Entity Storage Model

Each entity is stored with:

id: Primary key (user-defined)
vid: Version ID (auto-incrementing, unique per version)
block_range: Range of blocks where this version is valid
causality_region: For parallel processing safety
…fields: Entity-specific fields from schema

CREATE TABLE sgd1.user (
    vid BIGSERIAL PRIMARY KEY,
    id VARCHAR(255) NOT NULL,
    address BYTEA NOT NULL,
    balance NUMERIC NOT NULL,
    block_range int4range NOT NULL,
    EXCLUDE USING gist (id WITH =, block_range WITH &&)
);

Key insight: The block_range exclusion constraint ensures only one version exists for any block. Location in codebase: store/postgres/src/relational.rs generates schema

Write Operations

When mapping code calls entity.save():

Buffering: Entity changes are buffered in memory
Validation: Entity validates against schema
Conflict detection: Check for concurrent modifications
SQL generation: Generate INSERT or UPDATE statement
Transaction: Changes are committed in block-level transaction

pub trait WritableStore: Send + Sync {
    fn transact_block_operations(
        &self,
        block_ptr_to: BlockPtr,
        mods: Vec<EntityModification>,
        stopwatch: StopwatchMetrics,
        deterministic_errors: Vec<SubgraphError>,
    ) -> Result<StoreEvent, StoreError>;
}

Location in codebase: store/postgres/src/writable.rs

Batch Processing

Entities are written in batches for efficiency:

pub const WRITE_BATCH_SIZE: usize = 1000;

Batch processing:

Reduces database round trips
Improves transaction throughput
Enables bulk optimizations in PostgreSQL

Block Range Updates

When an entity is updated:

Close previous version: Set upper bound of previous block_range
Insert new version: Create new row with updated data and new block_range

This maintains complete history for time-travel queries. Example:

-- Original entity (blocks 100-110)
id='user1', balance=1000, block_range=[100, 110)

-- Update at block 110
id='user1', balance=1000, block_range=[100, 110)  -- closed
id='user1', balance=1500, block_range=[110, ∞)    -- new version

6. Cursor and Progress Tracking

Graph Node tracks indexing progress via cursors stored in the database.

Subgraph Cursor

The cursor represents the latest fully-processed block:

pub struct BlockPtr {
    pub hash: BlockHash,
    pub number: BlockNumber,
}

Cursor is updated after:

All triggers in block are processed
All entities are persisted
Transaction is committed

Deployment Status

Subgraph deployments track multiple status fields:

CREATE TABLE subgraphs.subgraph_deployment (
    id INTEGER PRIMARY KEY,
    deployment VARCHAR NOT NULL,
    latest_ethereum_block_number NUMERIC,
    latest_ethereum_block_hash BYTEA,
    synced BOOLEAN NOT NULL DEFAULT FALSE,
    failed BOOLEAN NOT NULL DEFAULT FALSE,
    fatal_error TEXT,
    -- ... more fields
);

Status values:

synced: Subgraph has caught up to chain head
failed: Subgraph encountered non-deterministic error
fatal_error: Error message if failed

Location in codebase: store/postgres/src/deployment_store.rs

Chain Reorganization Handling

Blockchain reorganizations (reorgs) occur when the canonical chain changes. Graph Node handles reorgs automatically.

Reorg Detection

Block streams detect reorgs by comparing:

Expected parent hash (from previous block)
Actual parent hash (from new block)

Mismatch indicates a reorg occurred.

Reorg Processing

Revert event: Block stream emits BlockStreamEvent::Revert
Entity rollback: Store reverts entity changes from reverted blocks
Cursor update: Cursor moves back to reorg point
Reprocessing: Blocks on new canonical chain are processed

pub enum BlockStreamEvent<C: Blockchain> {
    ProcessBlock(BlockWithTriggers<C>, FirehoseCursor),
    Revert(BlockPtr, FirehoseCursor),  // Revert to this block
}

Location in codebase: core/src/subgraph/runner.rs handles revert events

Entity Reversion

Entities are reverted using block ranges:

-- Delete entity versions created after reorg point
DELETE FROM sgd1.user WHERE lower(block_range) > $reorg_block;

-- Reopen entity versions that were closed after reorg point
UPDATE sgd1.user 
SET block_range = int4range(lower(block_range), NULL)
WHERE upper(block_range) > $reorg_block AND upper(block_range) IS NOT NULL;

Determinism guarantee: Because mappings are deterministic, reprocessing blocks produces the same entities as the first time.

Performance Optimizations

Parallel Block Processing

Graph Node can process multiple blocks in parallel when safe:

Independent blocks: Blocks without entity dependencies can be processed concurrently
Causality regions: Ensure conflicting updates are serialized
Configurable parallelism: Set via GRAPH_ETHEREUM_PARALLEL_BLOCK_RANGES

Declared Calls Optimization

Declared calls execute in parallel before handler invocation:

eventHandlers:
  - event: Swap(...)
    handler: handleSwap
    calls:
      reserves: Pair[event.address].getReserves()
      totalSupply: Pair[event.address].totalSupply()

Benefit: Multiple eth_call operations execute concurrently instead of sequentially. Available from: specVersion 1.2.0

Block Cache

Recently processed blocks are cached in memory:

Avoids redundant fetches
Speeds up reorg handling
Reduces blockchain node load

Configuration: GRAPH_ETHEREUM_BLOCK_CACHE_SIZE

Monitoring Indexing Progress

Metrics

Graph Node exposes Prometheus metrics at /metrics:

deployment_head: Current block number for each subgraph
ethereum_chain_head_number: Latest block on chain
deployment_sync_duration: Time to process each block
deployment_trigger_processing_duration: Handler execution time

Reference: docs/metrics.md

GraphQL Query

Query indexing status via meta field:

{
  _meta {
    block {
      number
      hash
      timestamp
    }
    deployment
    hasIndexingErrors
  }
}

Graphman CLI

Check subgraph status:

# Show deployment info
graphman info <deployment-id>

# Show indexing progress
graphman stats show <deployment-id>

Reference: docs/graphman.md

Error Handling

Deterministic Errors

Errors that always occur at the same block (e.g., divide by zero in mapping):

Behavior: Subgraph marks block as failed but continues
Feature flag: Requires nonFatalErrors in manifest features
Query impact: Queries succeed but may have incomplete data

Non-Deterministic Errors

Errors that may not reoccur (e.g., network timeout, database connection loss):

Behavior: Subgraph retries with exponential backoff
Failure threshold: After N retries, marks deployment as failed
Recovery: Requires manual restart or redeployment

Location in codebase: core/src/subgraph/runner.rs implements retry logic

Next Steps

Query Execution

Learn how indexed data is queried via GraphQL

Architecture

Understand Graph Node’s component architecture

Configuration

Optimize indexing performance with configuration

Monitoring

Set up metrics and monitoring for your node

Get Started

Core Concepts

Running Graph Node

Deployment

Advanced Configuration

Operations

​Indexing Overview

​Indexing Pipeline

​1. Block Ingestion

​Ingestion Modes

Firehose

RPC Polling

​Firehose Ingestion

​Block Storage

​2. Block Streaming

​Block Stream Creation

​Buffered Block Streaming

​Block Stream Events

​3. Trigger Extraction and Matching

​Trigger Types

​Trigger Adapter

​Trigger Filtering

​Trigger Matching Process

​4. Runtime Execution

​Runtime Host

​Handler Invocation

​Gas Metering

​Host Exports

​Dynamic Data Sources

​5. Entity Persistence

​Entity Storage Model

​Write Operations

​Batch Processing

​Block Range Updates

​6. Cursor and Progress Tracking

​Subgraph Cursor

​Deployment Status

​Chain Reorganization Handling

​Reorg Detection

​Reorg Processing

​Entity Reversion

​Performance Optimizations

​Parallel Block Processing

​Declared Calls Optimization

​Block Cache

​Monitoring Indexing Progress

​Metrics

​GraphQL Query

​Graphman CLI

​Error Handling

​Deterministic Errors

​Non-Deterministic Errors

​Next Steps

Query Execution

Architecture

Configuration

Monitoring

Build docs developers (and LLMs) love

Indexing Overview

Indexing Pipeline

1. Block Ingestion

Ingestion Modes

Firehose Ingestion

Block Storage

2. Block Streaming

Block Stream Creation

Buffered Block Streaming

Block Stream Events

3. Trigger Extraction and Matching

Trigger Types

Trigger Adapter

Trigger Filtering

Trigger Matching Process

4. Runtime Execution

Runtime Host

Handler Invocation

Gas Metering

Host Exports

Dynamic Data Sources

5. Entity Persistence

Entity Storage Model

Write Operations

Batch Processing

Block Range Updates

6. Cursor and Progress Tracking

Subgraph Cursor

Deployment Status

Chain Reorganization Handling

Reorg Detection

Reorg Processing

Entity Reversion

Performance Optimizations

Parallel Block Processing

Declared Calls Optimization

Block Cache

Monitoring Indexing Progress

Metrics

GraphQL Query

Graphman CLI

Error Handling

Deterministic Errors

Non-Deterministic Errors

Next Steps