Monitoring and Metrics

Graph Node exposes comprehensive metrics through a Prometheus endpoint, allowing you to monitor deployment health, indexing performance, query execution, and infrastructure utilization.

Metrics Endpoint

By default, Graph Node exposes metrics on port 8040:

http://localhost:8040/metrics

You can configure this port in your Graph Node configuration file.

Deployment Metrics

These metrics track the health and performance of individual subgraph deployments.

Block Processing

deployment_block_processing_duration

Type: HistogramMeasures the duration of block processing for a subgraph deployment.Labels:

deployment - The deployment hash (IPFS CID)
network - The blockchain network
shard - The database shard

Use cases:

Identify slow-processing deployments
Detect performance degradation over time
Compare processing speed across deployments

Example query:

rate(deployment_block_processing_duration_sum[5m]) / rate(deployment_block_processing_duration_count[5m])

deployment_trigger_processing_duration

Type: HistogramMeasures the duration of trigger processing (event handlers, call handlers, block handlers) for a deployment.Labels:

deployment
network
shard

Use cases:

Identify bottlenecks in handler execution
Monitor handler performance after code changes
Optimize trigger processing logic

deployment_transact_block_operations_duration

Type: HistogramMeasures the duration of committing entity operations for a block and updating the subgraph pointer.Labels:

deployment
network
shard

Use cases:

Monitor database write performance
Identify deployments with high entity churn
Detect database bottlenecks

Trigger and Handler Metrics

deployment_block_trigger_count

Type: CounterCounts the number of triggers in each block for a subgraph deployment.Labels:

deployment
network
shard

Use cases:

Understand trigger density per block
Correlate trigger count with processing time
Identify blocks with unusual trigger activity

deployment_handler_execution_time

Type: HistogramMeasures the execution time for individual handlers.Labels:

deployment
handler - The handler name
network
shard

Use cases:

Identify slow handlers
Optimize specific handler logic
Track performance impact of handler changes

deployment_host_fn_execution_time

Type: HistogramMeasures the execution time for host functions called by the WASM runtime.Labels:

deployment
host_fn - The host function name
network
shard

Use cases:

Identify expensive host function calls
Monitor store operations from handlers
Detect excessive entity loads or stores

Deployment Health

deployment_head

Type: GaugeTracks the head block number for a deployment.Example:

deployment_head{deployment="QmaeWFYbPwmXEk7UuACmkqgPq2Pba5t2RYdJtEyvAUmrxg",network="mumbai",shard="primary"} 19509077

Labels:

deployment
network
shard

Use cases:

Monitor sync progress
Calculate blocks behind chain head
Alert on stalled indexing

Alert example:

(ethereum_chain_head_number{network="mainnet"} - deployment_head{network="mainnet"}) > 100

deployment_failed

Type: Gauge (Boolean)Indicates whether a deployment has failed (1 = failed, 0 = healthy).Labels:

deployment
network
shard

Use cases:

Alert on deployment failures
Track deployment health over time
Trigger automatic remediation

Alert example:

deployment_failed{deployment="QmXYZ..."} == 1

deployment_reverted_blocks

Type: CounterTracks the last reverted block for a subgraph deployment.Labels:

deployment
network
shard

Use cases:

Monitor chain reorganizations
Identify frequently reorged deployments
Correlate reverts with network issues

deployment_sync_secs

Type: CounterTotal time spent syncing a deployment.Labels:

deployment
network
shard

Use cases:

Calculate average sync time
Monitor sync efficiency
Estimate time to full sync

deployment_count

Type: GaugeCounts the number of deployments currently being indexed by the graph-node.Use cases:

Monitor node capacity
Track deployment growth
Plan infrastructure scaling

Blockchain Metrics

Ethereum Chain Head

ethereum_chain_head_number

Type: GaugeBlock number of the most recent block synced from Ethereum.Example:

ethereum_chain_head_number{network="mumbai"} 20045294

Labels:

network - The blockchain network name

Use cases:

Verify RPC connectivity
Calculate deployment lag
Monitor chain sync status

Example query (blocks behind):

ethereum_chain_head_number{network="mainnet"} - deployment_head{network="mainnet"}

RPC Performance

deployment_eth_rpc_request_duration

Type: HistogramMeasures Ethereum RPC request duration for a subgraph deployment.Labels:

deployment
network
method - The RPC method called

Use cases:

Monitor RPC provider performance
Identify slow RPC methods
Detect RPC provider issues

deployment_eth_rpc_errors

Type: CounterCounts Ethereum RPC request errors for a subgraph deployment.Labels:

deployment
network
method

Use cases:

Alert on RPC failures
Monitor RPC provider reliability
Trigger provider failover

eth_rpc_request_duration

Type: HistogramGlobal Ethereum RPC request duration across all deployments.Labels:

network
method

Use cases:

Monitor overall RPC health
Compare RPC performance across networks
Benchmark RPC providers

eth_rpc_errors

Type: CounterGlobal count of Ethereum RPC request errors.Labels:

network
method

Use cases:

Track RPC reliability
Alert on widespread RPC issues
Monitor error rates by method

Query Metrics

These metrics help monitor GraphQL query performance and caching efficiency.

Query Execution

query_execution_time

Type: HistogramExecution time for successful GraphQL queries.Labels:

deployment
query_id - Hash of the query

Use cases:

Identify slow queries
Monitor query performance trends
Optimize query execution

query_effort_ms

Type: GaugeMoving average of time spent running queries.Use cases:

Monitor overall query load
Detect query performance degradation
Trigger scaling decisions

query_blocks_behind

Type: HistogramTracks how many blocks behind the subgraph head queries are being made.Use cases:

Inform pruning decisions
Understand query patterns
Optimize history retention

Query Results

query_result_size

Type: HistogramThe size of successful GraphQL query results (in CacheWeight).Labels:

deployment

Use cases:

Monitor result size distribution
Identify queries returning large datasets
Optimize pagination strategies

query_result_max

Type: GaugeThe maximum size of a query result (in CacheWeight).Labels:

deployment

Use cases:

Track largest queries
Set result size limits
Prevent resource exhaustion

Caching

query_cache_status_count

Type: CounterCounts top-level GraphQL fields executed and their cache status.Labels:

deployment
field - The GraphQL field name
status - hit or miss

Use cases:

Monitor cache hit rates
Identify frequently queried fields
Optimize caching strategy

Cache hit rate:

rate(query_cache_status_count{status="hit"}[5m]) / rate(query_cache_status_count[5m])

Load Management

query_kill_rate

Type: GaugeThe rate at which the load manager kills queries.Use cases:

Monitor query overload
Adjust query timeout settings
Alert on excessive query cancellations

query_semaphore_wait_ms

Type: GaugeMoving average of time spent waiting for the Postgres query semaphore.Use cases:

Monitor database connection contention
Adjust connection pool size
Identify query queueing issues

Store Metrics

Metrics related to PostgreSQL database operations.

store_connection_checkout_count

Type: GaugeThe number of Postgres connections currently checked out.Labels:

pool - The connection pool name
shard - The database shard

Use cases:

Monitor connection pool utilization
Detect connection leaks
Size connection pools appropriately

store_connection_wait_time_ms

Type: HistogramAverage connection wait time from the pool.Labels:

pool
shard

Use cases:

Identify connection pool bottlenecks
Optimize pool configuration
Alert on connection exhaustion

store_connection_error_count

Type: CounterThe number of Postgres connection errors.Labels:

pool
shard

Use cases:

Monitor database connectivity
Alert on connection failures
Detect database issues

System Metrics

registered_metrics

Type: GaugeTracks the number of registered metrics on the node.Use cases:

Monitor metric system health
Detect metric registration leaks
Track metric growth over time

metrics_register_errors

Type: CounterCounts Prometheus metrics register errors.Use cases:

Detect metric registration issues
Alert on metrics system problems

metrics_unregister_errors

Type: CounterCounts Prometheus metrics unregister errors.Use cases:

Monitor metric cleanup issues
Detect metric lifecycle problems

Monitoring Setup

Prometheus Configuration

Add Graph Node to your prometheus.yml:

scrape_configs:
  - job_name: 'graph-node'
    static_configs:
      - targets: ['localhost:8040']
    scrape_interval: 15s
    scrape_timeout: 10s

Grafana Dashboard Setup

Add Prometheus data source

Configure Grafana to connect to your Prometheus instance.

Create deployment health panel

# Blocks behind chain head
ethereum_chain_head_number{network="mainnet"} - deployment_head{network="mainnet"}

Create indexing speed panel

# Blocks per minute
rate(deployment_head[5m]) * 60

Create query performance panel

# Average query time (ms)
rate(query_execution_time_sum[5m]) / rate(query_execution_time_count[5m]) * 1000

Alert Examples

Deployment Health Alerts

Deployment is falling behind

- alert: DeploymentBehind
  expr: (ethereum_chain_head_number - deployment_head) > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Deployment {{ $labels.deployment }} is falling behind"
    description: "Deployment is {{ $value }} blocks behind chain head"

Deployment has failed

- alert: DeploymentFailed
  expr: deployment_failed == 1
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Deployment {{ $labels.deployment }} has failed"
    description: "Deployment is in failed state on {{ $labels.shard }}"

High RPC error rate

- alert: HighRPCErrorRate
  expr: rate(deployment_eth_rpc_errors[5m]) > 0.1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High RPC error rate for {{ $labels.deployment }}"
    description: "RPC error rate is {{ $value }} errors/sec"

Query Performance Alerts

Slow query performance

- alert: SlowQueries
  expr: (rate(query_execution_time_sum[5m]) / rate(query_execution_time_count[5m])) > 1
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Slow queries on {{ $labels.deployment }}"
    description: "Average query time is {{ $value }}s"

High query kill rate

- alert: HighQueryKillRate
  expr: query_kill_rate > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High query kill rate"
    description: "{{ $value }} queries/sec are being killed"

Database Alerts

Connection pool exhaustion

- alert: ConnectionPoolExhaustion
  expr: store_connection_checkout_count / 100 > 0.9
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Connection pool nearly exhausted for {{ $labels.pool }}"
    description: "Pool utilization is {{ $value | humanizePercentage }}"

High connection wait time

- alert: HighConnectionWait
  expr: store_connection_wait_time_ms > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High connection wait time for {{ $labels.pool }}"
    description: "Average wait time is {{ $value }}ms"

Best Practices

Monitor deployment lag consistently

Always track ethereum_chain_head_number - deployment_head to ensure deployments stay synchronized with the blockchain.

Set up alerts for critical metrics

Configure alerts for deployment failures, high error rates, and significant lag to catch issues early.

Use recording rules for complex queries

Pre-compute expensive queries like cache hit rates and average query times to reduce dashboard load.

Track query patterns

Monitor query_cache_status_count and query_blocks_behind to optimize caching and pruning strategies.

Correlate metrics across layers

When investigating issues, look at RPC metrics, handler metrics, and store metrics together to identify the root cause.

Monitor resource utilization

Track connection pool usage and query semaphore wait times to prevent resource exhaustion.

Establish baselines

Record normal metric ranges for your deployments to quickly identify anomalies.

Troubleshooting with Metrics

Slow Indexing

Check block processing duration

rate(deployment_block_processing_duration_sum[5m]) / rate(deployment_block_processing_duration_count[5m])

Identify slow handlers

topk(5, deployment_handler_execution_time)

Check RPC performance

rate(deployment_eth_rpc_request_duration_sum[5m]) / rate(deployment_eth_rpc_request_duration_count[5m])

Monitor database operations

rate(deployment_transact_block_operations_duration_sum[5m]) / rate(deployment_transact_block_operations_duration_count[5m])

Poor Query Performance

Check cache hit rate

rate(query_cache_status_count{status="hit"}[5m]) / rate(query_cache_status_count[5m])

Identify large queries

query_result_max

Check connection contention

query_semaphore_wait_ms

Monitor query load

query_effort_ms

Additional Resources

Graphman CLI - Management commands
Maintenance Operations - Deployment management
Pruning Guide - Optimize storage
Configuration - Graph Node configuration

Get Started

Core Concepts

Running Graph Node

Deployment

Advanced Configuration

Operations

Monitoring and Metrics

Metrics Endpoint

Deployment Metrics

Block Processing

Trigger and Handler Metrics

Deployment Health

Blockchain Metrics

Ethereum Chain Head

RPC Performance

Query Metrics

Query Execution

Query Results

Caching

Load Management

Store Metrics

System Metrics

Monitoring Setup

Prometheus Configuration

Grafana Dashboard Setup

Alert Examples

Deployment Health Alerts

Query Performance Alerts

Database Alerts

Best Practices

Troubleshooting with Metrics

Slow Indexing

Poor Query Performance

Additional Resources

Build docs developers (and LLMs) love

Get Started

Core Concepts

Running Graph Node

Deployment

Advanced Configuration

Operations

​Metrics Endpoint

​Deployment Metrics

​Block Processing

​Trigger and Handler Metrics

​Deployment Health

​Blockchain Metrics

​Ethereum Chain Head

​RPC Performance

​Query Metrics

​Query Execution

​Query Results

​Caching

​Load Management

​Store Metrics

​System Metrics

​Monitoring Setup

​Prometheus Configuration

​Grafana Dashboard Setup

​Alert Examples

​Deployment Health Alerts

​Query Performance Alerts

​Database Alerts

​Best Practices

​Troubleshooting with Metrics

​Slow Indexing

​Poor Query Performance

​Additional Resources

Build docs developers (and LLMs) love

Metrics Endpoint

Deployment Metrics

Block Processing

Trigger and Handler Metrics

Deployment Health

Blockchain Metrics

Ethereum Chain Head

RPC Performance

Query Metrics

Query Execution

Query Results

Caching

Load Management

Store Metrics

System Metrics

Monitoring Setup

Prometheus Configuration

Grafana Dashboard Setup

Alert Examples

Deployment Health Alerts

Query Performance Alerts

Database Alerts

Best Practices

Troubleshooting with Metrics

Slow Indexing

Poor Query Performance

Additional Resources