Indexing Overview
Graph Node continuously monitors blockchains and processes relevant data as new blocks are produced. The indexing process is:- Real-time: New blocks are processed as they’re produced
- Deterministic: Same inputs always produce same outputs
- Resumable: Can stop and resume from any block
- Reorg-safe: Handles chain reorganizations automatically
Graph Node maintains block-level granularity for all entities, enabling time-travel queries and seamless chain reorganization handling.
Indexing Pipeline
The indexing pipeline consists of several stages that transform raw blockchain data into queryable entities:1. Block Ingestion
Block ingestion is the process of fetching new blocks from blockchain nodes and storing them in Graph Node’s block cache.Ingestion Modes
Firehose
High-performance streaming protocol optimized for indexing. Recommended for production.
RPC Polling
Direct RPC calls to blockchain nodes. Simpler but less efficient.
Firehose Ingestion
Firehose is a gRPC-based streaming protocol that provides:- Linear streaming: Blocks arrive in order without gaps
- Fork awareness: Proper handling of chain reorganizations
- Efficient format: Protobuf encoding for minimal bandwidth
- Cursor-based resumption: Resume from exact position after restart
graph/src/blockchain/firehose_block_ingestor.rs
Block Storage
Ingested blocks are stored in theChainStore for later retrieval:
- Block data: Full block information (transactions, logs, etc.)
- Block metadata: Hash, number, parent hash, timestamp
- Chain head tracking: Latest block for each chain
store/postgres/src/chain_store.rs
2. Block Streaming
The block stream converts ingested blocks into a stream of blocks relevant to specific subgraphs.Block Stream Creation
- start_blocks: List of starting points for different data sources
- filter: Trigger filter specifying what events/calls to extract
- deployment: Identifies which subgraph is indexing
graph/src/blockchain/block_stream.rs
Buffered Block Streaming
Block streams are buffered for performance:- Parallel processing: Multiple blocks can be processed concurrently
- Backpressure handling: Slower consumers don’t overwhelm faster producers
- Smooth performance: Evens out processing time variations
graph/src/blockchain/block_stream.rs:22-23
Block Stream Events
Block streams emit events as they process the chain:3. Trigger Extraction and Matching
Triggers are blockchain events that activate subgraph handlers. The triggers adapter extracts relevant triggers from blocks.Trigger Types
For Ethereum:- Event triggers: Smart contract event logs
- Call triggers: Function calls to contracts (requires archive/trace)
- Block triggers: Block data itself (filtered or all blocks)
Trigger Adapter
graph/src/blockchain/block_stream.rs
Trigger Filtering
Trigger filters specify what to extract from blocks:graph/src/blockchain/mod.rs:282-298
Trigger Matching Process
- Filter application: Block stream applies filter to extract candidate triggers
- Data source matching: Each trigger is matched against subgraph data sources
- Handler identification: Matching determines which handler to invoke
- Decoding: Trigger data is decoded into handler-specific format
graph/src/blockchain/mod.rs:340-345
4. Runtime Execution
Once triggers are matched to handlers, the WASM runtime executes the mapping code.Runtime Host
TheRuntimeHost manages WASM module lifecycle:
runtime/wasm/src/host.rs
Handler Invocation
For each trigger:- Module instantiation: Create fresh WASM instance (or reuse pooled instance)
- Context setup: Prepare block, transaction, and event data
- Handler call: Invoke exported handler function
- Host function execution: Process calls to
entity.save(),ethereum.call(), etc. - Gas accounting: Track and limit computation (gas metering)
Gas Metering
Gas metering prevents infinite loops and resource exhaustion:GRAPH_MAX_GAS_PER_HANDLER environment variable
Location in codebase: graph/src/runtime/gas.rs
Host Exports
Host exports are functions that WASM code can call:Entity Operations
Entity Operations
entity.save(): Persist entity to storeEntity.load(id): Load entity from storestore.remove(entity, id): Delete entity
runtime/wasm/src/host_exports.rsEthereum Operations
Ethereum Operations
ethereum.call(): Make eth_call to contract- Access to
event.params,block.timestamp,transaction.hash, etc.
chain/ethereum/src/runtime.rsIPFS Operations
IPFS Operations
ipfs.cat(hash): Fetch file from IPFSipfs.map(hash, callback, flags): Process IPFS file with callback
Crypto Operations
Crypto Operations
crypto.keccak256(input): Keccak-256 hashcrypto.sha256(input): SHA-256 hash
Dynamic Data Sources
Handlers can create new data sources at runtime:DataSourceTemplate.create(address)calls host function- New data source is instantiated from template
- Data source is added to active data sources
- Trigger filter is updated
- Block may be refetched if
is_refetch_block_required()returns true
core/src/subgraph/runner.rs handles dynamic data source lifecycle
5. Entity Persistence
Entities modified by handlers are persisted to PostgreSQL with block-level granularity.Entity Storage Model
Each entity is stored with:- id: Primary key (user-defined)
- vid: Version ID (auto-incrementing, unique per version)
- block_range: Range of blocks where this version is valid
- causality_region: For parallel processing safety
- …fields: Entity-specific fields from schema
block_range exclusion constraint ensures only one version exists for any block.
Location in codebase: store/postgres/src/relational.rs generates schema
Write Operations
When mapping code callsentity.save():
- Buffering: Entity changes are buffered in memory
- Validation: Entity validates against schema
- Conflict detection: Check for concurrent modifications
- SQL generation: Generate INSERT or UPDATE statement
- Transaction: Changes are committed in block-level transaction
store/postgres/src/writable.rs
Batch Processing
Entities are written in batches for efficiency:- Reduces database round trips
- Improves transaction throughput
- Enables bulk optimizations in PostgreSQL
Block Range Updates
When an entity is updated:- Close previous version: Set upper bound of previous
block_range - Insert new version: Create new row with updated data and new
block_range
6. Cursor and Progress Tracking
Graph Node tracks indexing progress via cursors stored in the database.Subgraph Cursor
The cursor represents the latest fully-processed block:- All triggers in block are processed
- All entities are persisted
- Transaction is committed
Deployment Status
Subgraph deployments track multiple status fields:- synced: Subgraph has caught up to chain head
- failed: Subgraph encountered non-deterministic error
- fatal_error: Error message if failed
store/postgres/src/deployment_store.rs
Chain Reorganization Handling
Blockchain reorganizations (reorgs) occur when the canonical chain changes. Graph Node handles reorgs automatically.Reorg Detection
Block streams detect reorgs by comparing:- Expected parent hash (from previous block)
- Actual parent hash (from new block)
Reorg Processing
- Revert event: Block stream emits
BlockStreamEvent::Revert - Entity rollback: Store reverts entity changes from reverted blocks
- Cursor update: Cursor moves back to reorg point
- Reprocessing: Blocks on new canonical chain are processed
core/src/subgraph/runner.rs handles revert events
Entity Reversion
Entities are reverted using block ranges:Determinism guarantee: Because mappings are deterministic, reprocessing blocks produces the same entities as the first time.
Performance Optimizations
Parallel Block Processing
Graph Node can process multiple blocks in parallel when safe:- Independent blocks: Blocks without entity dependencies can be processed concurrently
- Causality regions: Ensure conflicting updates are serialized
- Configurable parallelism: Set via
GRAPH_ETHEREUM_PARALLEL_BLOCK_RANGES
Declared Calls Optimization
Declared calls execute in parallel before handler invocation:eth_call operations execute concurrently instead of sequentially.
Available from: specVersion 1.2.0
Block Cache
Recently processed blocks are cached in memory:- Avoids redundant fetches
- Speeds up reorg handling
- Reduces blockchain node load
GRAPH_ETHEREUM_BLOCK_CACHE_SIZE
Monitoring Indexing Progress
Metrics
Graph Node exposes Prometheus metrics at/metrics:
deployment_head: Current block number for each subgraphethereum_chain_head_number: Latest block on chaindeployment_sync_duration: Time to process each blockdeployment_trigger_processing_duration: Handler execution time
docs/metrics.md
GraphQL Query
Query indexing status via meta field:Graphman CLI
Check subgraph status:docs/graphman.md
Error Handling
Deterministic Errors
Errors that always occur at the same block (e.g., divide by zero in mapping):- Behavior: Subgraph marks block as failed but continues
- Feature flag: Requires
nonFatalErrorsin manifest features - Query impact: Queries succeed but may have incomplete data
Non-Deterministic Errors
Errors that may not reoccur (e.g., network timeout, database connection loss):- Behavior: Subgraph retries with exponential backoff
- Failure threshold: After N retries, marks deployment as failed
- Recovery: Requires manual restart or redeployment
core/src/subgraph/runner.rs implements retry logic
Next Steps
Query Execution
Learn how indexed data is queried via GraphQL
Architecture
Understand Graph Node’s component architecture
Configuration
Optimize indexing performance with configuration
Monitoring
Set up metrics and monitoring for your node

