Skip to main content

Overview

Firedancer maintains many internal performance counters for use by developers and monitoring tools, and exposes them via a Prometheus HTTP endpoint.

Configuration

Configure the Prometheus metrics endpoint in your config.toml:
config.toml
[tiles.metric]
    prometheus_listen_port = 7999

Accessing Metrics

Once configured, you can query the metrics endpoint using curl or any Prometheus-compatible scraper:
curl http://localhost:7999/metrics
Example Response:
# HELP tile_pid The process ID of the tile.
# TYPE tile_pid gauge
tile_pid{kind="net",kind_id="0"} 1527373
tile_pid{kind="quic",kind_id="0"} 1527370
tile_pid{kind="quic",kind_id="1"} 1527371
tile_pid{kind="verify",kind_id="0"} 1527369
tile_pid{kind="verify",kind_id="1"} 1527374
tile_pid{kind="dedup",kind_id="0"} 1527365
...
Metrics are currently only provided for developer and diagnostic use. The endpoint or data provided may break or change in incompatible ways at any time.

Metric Types

Firedancer reports three metric types following the Prometheus data model:
counter
metric type
A cumulative metric representing a monotonically increasing counter.
gauge
metric type
A single numerical value that can go arbitrarily up or down.
histogram
metric type
Samples observations like packet sizes and counts them in buckets.

Available Metrics

Metrics for all inter-tile communication links:
The number of times the link reader has consumed a fragment.
The total number of bytes read by the link consumer.
The number of fragments that were filtered and not consumed.
The total number of bytes read by the link consumer that were filtered.
The number of times the link has been overrun while polling.
The number of input overruns detected while reading metadata by the consumer.
The number of times the consumer was detected as rate limiting by the producer.

Tile Metrics

Metrics available for all tiles:
tile_pid
gauge
The process ID of the tile.
tile_tid
gauge
The thread ID of the tile. Always the same as the PID in production, but might be different in development.
tile_last_cpu
gauge
Index of the CPU last executed on.
tile_context_switch_involuntary_count
counter
The number of involuntary context switches.
tile_context_switch_voluntary_count
counter
The number of voluntary context switches.
tile_page_fault_major_count
counter
The number of major page faults.
tile_page_fault_minor_count
counter
The number of minor page faults.
tile_status
gauge
The current status of the tile: 0 is booting, 1 is running, 2 is shutdown.
tile_heartbeat
gauge
The last UNIX timestamp in nanoseconds that the tile heartbeated.
tile_in_backpressure
gauge
Whether the tile is currently backpressured or not, either 1 or 0.
tile_backpressure_count
counter
Number of times the tile has had to wait for one or more consumers to catch up to resume publishing.

Tile-Specific Metrics

IPEcho Tile

ipecho_current_shred_version
gauge
The current shred version used by the validator.
ipecho_connection_count
gauge
The number of active connections to the ipecho service.
ipecho_connections_closed_ok
counter
The number of connections that have been made and closed normally.
ipecho_connections_closed_error
counter
The number of connections that have been made and closed abnormally.

Snapshot Control Tile

snapct_state
gauge
State of the snapshot control tile.
snapct_full_bytes_read
gauge
Number of bytes read so far from the full snapshot. Might decrease if snapshot load is aborted and restarted.
snapct_full_bytes_total
gauge
Total size of the full snapshot file.
snapct_incremental_bytes_read
gauge
Number of bytes read so far from the incremental snapshot.
snapct_predicted_slot
gauge
The predicted slot from which replay starts after snapshot loading finishes.

Integration with Monitoring Tools

Prometheus

Add Firedancer as a scrape target in your prometheus.yml:
prometheus.yml
scrape_configs:
  - job_name: 'firedancer'
    static_configs:
      - targets: ['localhost:7999']

Grafana

Once metrics are being scraped by Prometheus, you can create Grafana dashboards to visualize:
  • Tile health and backpressure
  • Link throughput and overruns
  • Context switches and page faults
  • Snapshot loading progress
For a complete list of all available metrics including tile-specific counters for net, quic, verify, dedup, pack, bank, poh, shred, store, sign, and other tiles, refer to the full metrics output from your running validator.

Build docs developers (and LLMs) love