Skip to main content
Graph Node exposes comprehensive metrics through a Prometheus endpoint, allowing you to monitor deployment health, indexing performance, query execution, and infrastructure utilization.

Metrics Endpoint

By default, Graph Node exposes metrics on port 8040:
http://localhost:8040/metrics
You can configure this port in your Graph Node configuration file.

Deployment Metrics

These metrics track the health and performance of individual subgraph deployments.

Block Processing

Type: HistogramMeasures the duration of block processing for a subgraph deployment.Labels:
  • deployment - The deployment hash (IPFS CID)
  • network - The blockchain network
  • shard - The database shard
Use cases:
  • Identify slow-processing deployments
  • Detect performance degradation over time
  • Compare processing speed across deployments
Example query:
rate(deployment_block_processing_duration_sum[5m]) / rate(deployment_block_processing_duration_count[5m])
Type: HistogramMeasures the duration of trigger processing (event handlers, call handlers, block handlers) for a deployment.Labels:
  • deployment
  • network
  • shard
Use cases:
  • Identify bottlenecks in handler execution
  • Monitor handler performance after code changes
  • Optimize trigger processing logic
Type: HistogramMeasures the duration of committing entity operations for a block and updating the subgraph pointer.Labels:
  • deployment
  • network
  • shard
Use cases:
  • Monitor database write performance
  • Identify deployments with high entity churn
  • Detect database bottlenecks

Trigger and Handler Metrics

Type: CounterCounts the number of triggers in each block for a subgraph deployment.Labels:
  • deployment
  • network
  • shard
Use cases:
  • Understand trigger density per block
  • Correlate trigger count with processing time
  • Identify blocks with unusual trigger activity
Type: HistogramMeasures the execution time for individual handlers.Labels:
  • deployment
  • handler - The handler name
  • network
  • shard
Use cases:
  • Identify slow handlers
  • Optimize specific handler logic
  • Track performance impact of handler changes
Type: HistogramMeasures the execution time for host functions called by the WASM runtime.Labels:
  • deployment
  • host_fn - The host function name
  • network
  • shard
Use cases:
  • Identify expensive host function calls
  • Monitor store operations from handlers
  • Detect excessive entity loads or stores

Deployment Health

Type: GaugeTracks the head block number for a deployment.Example:
deployment_head{deployment="QmaeWFYbPwmXEk7UuACmkqgPq2Pba5t2RYdJtEyvAUmrxg",network="mumbai",shard="primary"} 19509077
Labels:
  • deployment
  • network
  • shard
Use cases:
  • Monitor sync progress
  • Calculate blocks behind chain head
  • Alert on stalled indexing
Alert example:
(ethereum_chain_head_number{network="mainnet"} - deployment_head{network="mainnet"}) > 100
Type: Gauge (Boolean)Indicates whether a deployment has failed (1 = failed, 0 = healthy).Labels:
  • deployment
  • network
  • shard
Use cases:
  • Alert on deployment failures
  • Track deployment health over time
  • Trigger automatic remediation
Alert example:
deployment_failed{deployment="QmXYZ..."} == 1
Type: CounterTracks the last reverted block for a subgraph deployment.Labels:
  • deployment
  • network
  • shard
Use cases:
  • Monitor chain reorganizations
  • Identify frequently reorged deployments
  • Correlate reverts with network issues
Type: CounterTotal time spent syncing a deployment.Labels:
  • deployment
  • network
  • shard
Use cases:
  • Calculate average sync time
  • Monitor sync efficiency
  • Estimate time to full sync
Type: GaugeCounts the number of deployments currently being indexed by the graph-node.Use cases:
  • Monitor node capacity
  • Track deployment growth
  • Plan infrastructure scaling

Blockchain Metrics

Ethereum Chain Head

Type: GaugeBlock number of the most recent block synced from Ethereum.Example:
ethereum_chain_head_number{network="mumbai"} 20045294
Labels:
  • network - The blockchain network name
Use cases:
  • Verify RPC connectivity
  • Calculate deployment lag
  • Monitor chain sync status
Example query (blocks behind):
ethereum_chain_head_number{network="mainnet"} - deployment_head{network="mainnet"}

RPC Performance

Type: HistogramMeasures Ethereum RPC request duration for a subgraph deployment.Labels:
  • deployment
  • network
  • method - The RPC method called
Use cases:
  • Monitor RPC provider performance
  • Identify slow RPC methods
  • Detect RPC provider issues
Type: CounterCounts Ethereum RPC request errors for a subgraph deployment.Labels:
  • deployment
  • network
  • method
Use cases:
  • Alert on RPC failures
  • Monitor RPC provider reliability
  • Trigger provider failover
Type: HistogramGlobal Ethereum RPC request duration across all deployments.Labels:
  • network
  • method
Use cases:
  • Monitor overall RPC health
  • Compare RPC performance across networks
  • Benchmark RPC providers
Type: CounterGlobal count of Ethereum RPC request errors.Labels:
  • network
  • method
Use cases:
  • Track RPC reliability
  • Alert on widespread RPC issues
  • Monitor error rates by method

Query Metrics

These metrics help monitor GraphQL query performance and caching efficiency.

Query Execution

Type: HistogramExecution time for successful GraphQL queries.Labels:
  • deployment
  • query_id - Hash of the query
Use cases:
  • Identify slow queries
  • Monitor query performance trends
  • Optimize query execution
Type: GaugeMoving average of time spent running queries.Use cases:
  • Monitor overall query load
  • Detect query performance degradation
  • Trigger scaling decisions
Type: HistogramTracks how many blocks behind the subgraph head queries are being made.Use cases:
  • Inform pruning decisions
  • Understand query patterns
  • Optimize history retention

Query Results

Type: HistogramThe size of successful GraphQL query results (in CacheWeight).Labels:
  • deployment
Use cases:
  • Monitor result size distribution
  • Identify queries returning large datasets
  • Optimize pagination strategies
Type: GaugeThe maximum size of a query result (in CacheWeight).Labels:
  • deployment
Use cases:
  • Track largest queries
  • Set result size limits
  • Prevent resource exhaustion

Caching

Type: CounterCounts top-level GraphQL fields executed and their cache status.Labels:
  • deployment
  • field - The GraphQL field name
  • status - hit or miss
Use cases:
  • Monitor cache hit rates
  • Identify frequently queried fields
  • Optimize caching strategy
Cache hit rate:
rate(query_cache_status_count{status="hit"}[5m]) / rate(query_cache_status_count[5m])

Load Management

Type: GaugeThe rate at which the load manager kills queries.Use cases:
  • Monitor query overload
  • Adjust query timeout settings
  • Alert on excessive query cancellations
Type: GaugeMoving average of time spent waiting for the Postgres query semaphore.Use cases:
  • Monitor database connection contention
  • Adjust connection pool size
  • Identify query queueing issues

Store Metrics

Metrics related to PostgreSQL database operations.
Type: GaugeThe number of Postgres connections currently checked out.Labels:
  • pool - The connection pool name
  • shard - The database shard
Use cases:
  • Monitor connection pool utilization
  • Detect connection leaks
  • Size connection pools appropriately
Type: HistogramAverage connection wait time from the pool.Labels:
  • pool
  • shard
Use cases:
  • Identify connection pool bottlenecks
  • Optimize pool configuration
  • Alert on connection exhaustion
Type: CounterThe number of Postgres connection errors.Labels:
  • pool
  • shard
Use cases:
  • Monitor database connectivity
  • Alert on connection failures
  • Detect database issues

System Metrics

Type: GaugeTracks the number of registered metrics on the node.Use cases:
  • Monitor metric system health
  • Detect metric registration leaks
  • Track metric growth over time
Type: CounterCounts Prometheus metrics register errors.Use cases:
  • Detect metric registration issues
  • Alert on metrics system problems
Type: CounterCounts Prometheus metrics unregister errors.Use cases:
  • Monitor metric cleanup issues
  • Detect metric lifecycle problems

Monitoring Setup

Prometheus Configuration

Add Graph Node to your prometheus.yml:
scrape_configs:
  - job_name: 'graph-node'
    static_configs:
      - targets: ['localhost:8040']
    scrape_interval: 15s
    scrape_timeout: 10s

Grafana Dashboard Setup

1

Add Prometheus data source

Configure Grafana to connect to your Prometheus instance.
2

Create deployment health panel

# Blocks behind chain head
ethereum_chain_head_number{network="mainnet"} - deployment_head{network="mainnet"}
3

Create indexing speed panel

# Blocks per minute
rate(deployment_head[5m]) * 60
4

Create query performance panel

# Average query time (ms)
rate(query_execution_time_sum[5m]) / rate(query_execution_time_count[5m]) * 1000

Alert Examples

Deployment Health Alerts

- alert: DeploymentBehind
  expr: (ethereum_chain_head_number - deployment_head) > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Deployment {{ $labels.deployment }} is falling behind"
    description: "Deployment is {{ $value }} blocks behind chain head"
- alert: DeploymentFailed
  expr: deployment_failed == 1
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Deployment {{ $labels.deployment }} has failed"
    description: "Deployment is in failed state on {{ $labels.shard }}"
- alert: HighRPCErrorRate
  expr: rate(deployment_eth_rpc_errors[5m]) > 0.1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High RPC error rate for {{ $labels.deployment }}"
    description: "RPC error rate is {{ $value }} errors/sec"

Query Performance Alerts

- alert: SlowQueries
  expr: (rate(query_execution_time_sum[5m]) / rate(query_execution_time_count[5m])) > 1
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Slow queries on {{ $labels.deployment }}"
    description: "Average query time is {{ $value }}s"
- alert: HighQueryKillRate
  expr: query_kill_rate > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High query kill rate"
    description: "{{ $value }} queries/sec are being killed"

Database Alerts

- alert: ConnectionPoolExhaustion
  expr: store_connection_checkout_count / 100 > 0.9
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Connection pool nearly exhausted for {{ $labels.pool }}"
    description: "Pool utilization is {{ $value | humanizePercentage }}"
- alert: HighConnectionWait
  expr: store_connection_wait_time_ms > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High connection wait time for {{ $labels.pool }}"
    description: "Average wait time is {{ $value }}ms"

Best Practices

Always track ethereum_chain_head_number - deployment_head to ensure deployments stay synchronized with the blockchain.
Configure alerts for deployment failures, high error rates, and significant lag to catch issues early.
Pre-compute expensive queries like cache hit rates and average query times to reduce dashboard load.
Monitor query_cache_status_count and query_blocks_behind to optimize caching and pruning strategies.
When investigating issues, look at RPC metrics, handler metrics, and store metrics together to identify the root cause.
Track connection pool usage and query semaphore wait times to prevent resource exhaustion.
Record normal metric ranges for your deployments to quickly identify anomalies.

Troubleshooting with Metrics

Slow Indexing

1

Check block processing duration

rate(deployment_block_processing_duration_sum[5m]) / rate(deployment_block_processing_duration_count[5m])
2

Identify slow handlers

topk(5, deployment_handler_execution_time)
3

Check RPC performance

rate(deployment_eth_rpc_request_duration_sum[5m]) / rate(deployment_eth_rpc_request_duration_count[5m])
4

Monitor database operations

rate(deployment_transact_block_operations_duration_sum[5m]) / rate(deployment_transact_block_operations_duration_count[5m])

Poor Query Performance

1

Check cache hit rate

rate(query_cache_status_count{status="hit"}[5m]) / rate(query_cache_status_count[5m])
2

Identify large queries

query_result_max
3

Check connection contention

query_semaphore_wait_ms
4

Monitor query load

query_effort_ms

Additional Resources

Build docs developers (and LLMs) love