Monitoring

Overview

Yellowstone gRPC provides comprehensive Prometheus metrics for monitoring the health and performance of your gRPC plugin. The metrics endpoint exposes real-time data about connections, subscriptions, data flow, and system health.

Configuration

Prometheus Endpoint

Configure the Prometheus metrics endpoint in your config.json:

{
  "prometheus": {
    "address": "0.0.0.0:8999"
  }
}

The metrics are exposed at http://<address>/metrics in Prometheus text format.

Debug Endpoint

Enable the debug clients endpoint to view detailed information about active client connections:

{
  "debug_clients_http": true,
  "prometheus": {
    "address": "0.0.0.0:8999"
  }
}

Access client debug information at http://<address>/debug_clients.

Available Metrics

Version Information

version

counter

Plugin version information including build timestamp, git commit, package version, proto version, rustc version, and Solana version.

Slot Metrics

slot_status

gauge

Latest received slot from Geyser by commitment level (processed/confirmed/finalized).

slot_status_plugin

gauge

Latest processed slot in the plugin to client queues by commitment level.

Connection Metrics

connections_total

gauge

Total number of active connections to the gRPC service.

subscriptions_total

gauge

Total number of active subscriptions by endpoint and subscription type (accounts, slots, transactions, blocks, etc.).

Current concurrent subscriptions per remote TCP peer socket address.

Message Queue Metrics

message_queue_size

gauge

Current size of the Geyser message queue. High values may indicate processing bottlenecks.

grpc_subscriber_queue_size

gauge

Current size of each subscriber’s channel queue by subscriber ID.

Data Flow Metrics

grpc_message_sent_count

counter

Number of messages sent over gRPC to downstream clients by subscriber ID.

grpc_bytes_sent

counter

Number of bytes sent over gRPC to downstream clients by subscriber ID.

grpc_subscriber_send_bandwidth_load

gauge

Current send bandwidth load to subscriber channel in bytes per second by subscriber ID.

total_traffic_sent_bytes

counter

Total traffic sent to all subscribers.

traffic_sent_per_remote_ip_bytes

counter

Total traffic sent to subscribers by remote IP address.

grpc_service_outbound_bytes

gauge

Current emitted bytes by tonic service response bodies per active subscriber stream.

Block Reconstruction Metrics

invalid_full_blocks_total

gauge

Total number of failures when constructing full blocks, broken down by reason.

Monitor invalid_full_blocks_total closely. High values indicate issues with block reconstruction, which may result in incomplete data being sent to clients.

Client Metrics

grpc_client_disconnects_total

counter

Total client disconnections by subscriber ID and reason.

missed_status_message_total

counter

Number of missed messages by commitment level (processed/confirmed/finalized).

Geyser Plugin Metrics

geyser_account_update_data_size_kib

histogram

Histogram of all account update data sizes (in KiB) received from the Geyser plugin.

yellowstone_geyser_batch_size

histogram

Size of processed message batches from Geyser.

Performance Metrics

yellowstone_grpc_pre_encoded_cache_hit

counter

Pre-encoded cache hits by message type.

yellowstone_grpc_pre_encoded_cache_miss

counter

Pre-encoded cache misses by message type.

API Metrics

yellowstone_grpc_method_call_count

counter

Total number of calls to gRPC methods (GetVersion, GetLatestBlockhash, GetBlockHeight, GetSlot, IsBlockhashValid, Ping).

yellowstone_grpc_subscription_limit_exceeded_total

counter

Number of subscribe attempts that exceeded the per-subscriber limit by subscriber ID.

Monitoring Best Practices

Key Metrics to Alert On

Connection Saturation
- Monitor connections_total and set alerts for unusual spikes
- Track grpc_client_disconnects_total to identify stability issues
Queue Backlog
- Alert on high message_queue_size (>10,000 messages)
- Monitor grpc_subscriber_queue_size per client
Block Reconstruction Failures
- Alert on increasing invalid_full_blocks_total
- This indicates potential data integrity issues
Bandwidth Saturation
- Track grpc_subscriber_send_bandwidth_load per client
- Monitor total_traffic_sent_bytes growth rate
Slot Processing Lag
- Compare slot_status vs slot_status_plugin to detect processing delays
- Alert if the gap exceeds acceptable thresholds

Example Prometheus Queries

# Connection count
connections_total

# Message queue backlog
message_queue_size

# Bandwidth per subscriber (bytes/sec)
rate(grpc_bytes_sent[5m])

# Disconnect rate by reason
rate(grpc_client_disconnects_total[5m])

# Slot processing lag
slot_status{status="confirmed"} - slot_status_plugin{status="confirmed"}

# Invalid blocks rate
rate(invalid_full_blocks_total[5m])

Grafana Dashboard

Create a Grafana dashboard with these panels:

Overview
- Total connections
- Total subscriptions
- Message queue size
- Current slot numbers
Performance
- Message throughput (messages/sec)
- Bandwidth usage (bytes/sec)
- Cache hit rate
- Batch sizes
Health
- Invalid block count
- Client disconnects
- Missed messages
- Queue sizes per subscriber
Resource Usage
- Bandwidth per IP
- Concurrent subscriptions per connection
- Queue backlogs

Health Check Endpoint

The Prometheus endpoint also serves as a basic health check:

curl http://localhost:8999/metrics

If the endpoint responds with metrics, the plugin is running and operational.

Integration with Monitoring Systems

Prometheus Scrape Configuration

scrape_configs:
  - job_name: 'yellowstone-grpc'
    static_configs:
      - targets: ['localhost:8999']
    scrape_interval: 15s

Alert Rules Example

groups:
  - name: yellowstone_grpc
    rules:
      - alert: HighMessageQueueSize
        expr: message_queue_size > 10000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High message queue size detected"
          description: "Message queue has {{ $value }} messages"

      - alert: InvalidBlocksDetected
        expr: rate(invalid_full_blocks_total[5m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Invalid blocks detected"
          description: "Block reconstruction is failing"

      - alert: HighClientDisconnectRate
        expr: rate(grpc_client_disconnects_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High client disconnect rate"
          description: "Clients are disconnecting at {{ $value }}/sec"

Get Started

Geyser Plugin

Client SDKs

Guides

Operations

Overview

Configuration

Prometheus Endpoint

Debug Endpoint

Available Metrics

Version Information

Slot Metrics

Connection Metrics

Message Queue Metrics

Data Flow Metrics

Block Reconstruction Metrics

Client Metrics

Geyser Plugin Metrics

Performance Metrics

API Metrics

Monitoring Best Practices

Key Metrics to Alert On

Example Prometheus Queries

Grafana Dashboard

Health Check Endpoint

Integration with Monitoring Systems

Prometheus Scrape Configuration

Alert Rules Example

Build docs developers (and LLMs) love

Get Started

Geyser Plugin

Client SDKs

Guides

Operations

​Overview

​Configuration

​Prometheus Endpoint

​Debug Endpoint

​Available Metrics

​Version Information

​Slot Metrics

​Connection Metrics

​Message Queue Metrics

​Data Flow Metrics

​Block Reconstruction Metrics

​Client Metrics

​Geyser Plugin Metrics

​Performance Metrics

​API Metrics

​Monitoring Best Practices

​Key Metrics to Alert On

​Example Prometheus Queries

​Grafana Dashboard

​Health Check Endpoint

​Integration with Monitoring Systems

​Prometheus Scrape Configuration

​Alert Rules Example

Build docs developers (and LLMs) love

Overview

Configuration

Prometheus Endpoint

Debug Endpoint

Available Metrics

Version Information

Slot Metrics

Connection Metrics

Message Queue Metrics

Data Flow Metrics

Block Reconstruction Metrics

Client Metrics

Geyser Plugin Metrics

Performance Metrics

API Metrics

Monitoring Best Practices

Key Metrics to Alert On

Example Prometheus Queries

Grafana Dashboard

Health Check Endpoint

Integration with Monitoring Systems

Prometheus Scrape Configuration

Alert Rules Example