Monitoring & Metrics

Overview

Avail nodes expose comprehensive metrics through a Prometheus endpoint, enabling real-time monitoring of node performance, block production, and data availability operations.

Prometheus Endpoint

Default Configuration

By default, Avail nodes expose metrics on:

Host: 127.0.0.1 (localhost only)
Port: 9615
Endpoint: http://127.0.0.1:9615/metrics

Configuring the Prometheus Exporter

Enable Prometheus (enabled by default)

Prometheus metrics are enabled automatically. No action required.

Customize the port

avail-node --chain mainnet --prometheus-port 9616

Expose externally

Only expose metrics externally on trusted networks. Use firewalls to restrict access.

avail-node --chain mainnet --prometheus-external

Disable Prometheus

avail-node --chain mainnet --no-prometheus

Available Metrics

Avail-Specific Metrics

Avail extends the standard Substrate metrics with custom metrics for data availability operations:

Import Block Metrics

avail_import_block_total_execution_time

Type: Histogram
Unit: Microseconds
Description: Total time to import a block including DA verification

Header Extension Builder Metrics

avail_header_extension_builder_total_execution_time
avail_header_extension_builder_evaluation_grid_build_time
avail_header_extension_builder_commitment_build_time
avail_header_extension_builder_grid_rows
avail_header_extension_builder_grid_cols

Type: Histogram
Description: Metrics for Kate commitment building and grid construction
Tracks execution time for polynomial evaluation and commitment generation

Kate RPC Metrics

Enable with --enable-kate-rpc-metrics flag.

avail_kate_rpc_query_rows_execution_time
avail_kate_rpc_query_proof_execution_time
avail_kate_rpc_query_block_length_execution_time
avail_kate_rpc_query_data_proof_execution_time

Type: Histogram
Unit: Microseconds
Description: Execution time for Kate RPC methods

Standard Substrate Metrics

Avail inherits all standard Substrate metrics, including:

Block height and finality metrics
Transaction pool statistics
Network peer counts and bandwidth
Database I/O operations
Runtime execution times
BABE and GRANDPA consensus metrics

Setting Up Prometheus Server

Basic Prometheus Configuration

Create a prometheus.yml configuration file:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'avail-validator-1'
    static_configs:
      - targets: ['localhost:9615']

  - job_name: 'avail-validator-2'
    static_configs:
      - targets: ['localhost:9617']

  - job_name: 'avail-fullnode'
    static_configs:
      - targets: ['localhost:9619']

Multi-Node Setup

For monitoring multiple validators and full nodes:

scrape_configs:
  - job_name: 'validators'
    static_configs:
      - targets:
          - 'validator1.example.com:9615'
          - 'validator2.example.com:9615'
          - 'validator3.example.com:9615'
        labels:
          group: 'validators'

  - job_name: 'fullnodes'
    static_configs:
      - targets:
          - 'fullnode1.example.com:9615'
          - 'fullnode2.example.com:9615'
        labels:
          group: 'fullnodes'

Starting Prometheus

prometheus --config.file=prometheus.yml

Access the Prometheus UI at http://localhost:9090

Log Collection with Promtail

Promtail Configuration

For systemd-based deployments, configure Promtail to collect logs:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: avail-validator
    journal:
      json: false
      max_age: 12h
      path: /var/log/journal
      labels:
        job: avail-validator
    relabel_configs:
      - action: keep
        source_labels: ["__journal__systemd_unit"]
        regex: avail-validator.service
      - source_labels: ["__journal__systemd_unit"]
        target_label: "systemd_unit"

Grafana Dashboard Setup

Adding Prometheus Data Source

Open Grafana

Navigate to http://localhost:3000 (default Grafana port)

Add data source

Go to Configuration → Data Sources
Click “Add data source”
Select “Prometheus”
Set URL to http://localhost:9090
Click “Save & Test”

Import Substrate dashboard

Use the official Substrate/Polkadot dashboard templates as a starting point:

Dashboard ID: 13759 (Substrate Node Metrics)

Key Metrics to Monitor

Validator Health

substrate_block_height - Current block height
substrate_finalized_height - Latest finalized block
substrate_ready_transactions_number - Transaction pool size
avail_header_extension_builder_total_execution_time - Block building performance

Network Health

substrate_peers_count - Number of connected peers
substrate_sub_libp2p_network_bytes_total - Network traffic
substrate_sub_libp2p_connections_opened_total - Connection metrics

System Resources

substrate_database_cache_bytes - Database cache usage
substrate_memory_usage_bytes - Memory consumption
process_cpu_seconds_total - CPU usage

Telemetry

Avail nodes can report to telemetry servers for public monitoring:

Enable Telemetry

avail-node --chain mainnet \
  --telemetry-url 'wss://telemetry.example.com/submit 0'

Disable Telemetry

avail-node --chain mainnet --no-telemetry

Telemetry verbosity levels range from 0-9, with 0 being the least verbose.

Health Checks

RPC Health Endpoint

Check node health via RPC:

curl -H "Content-Type: application/json" \
  -d '{"id":1, "jsonrpc":"2.0", "method": "system_health"}' \
  http://localhost:9944

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "isSyncing": false,
    "peers": 45,
    "shouldHavePeers": true
  },
  "id": 1
}

System Monitoring Script

Example monitoring script:

#!/bin/bash

# Check if node is running
if ! systemctl is-active --quiet avail-node; then
  echo "CRITICAL: Avail node is not running"
  exit 2
fi

# Check peer count
PEERS=$(curl -s -X POST -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"system_health","id":1}' \
  http://localhost:9944 | jq -r '.result.peers')

if [ "$PEERS" -lt 5 ]; then
  echo "WARNING: Low peer count: $PEERS"
  exit 1
fi

echo "OK: Node healthy with $PEERS peers"
exit 0

Storage Monitoring

Enable Storage Monitoring

avail-node --chain mainnet \
  --db-storage-threshold 10240 \
  --db-storage-polling-period 60

--db-storage-threshold <MiB> - Minimum free space required (default: 1024 MiB)
--db-storage-polling-period <SECONDS> - Check interval (default: 5 seconds)

The node will gracefully terminate if available storage drops below the threshold.

Alerting Rules

Example Prometheus alerting rules:

groups:
  - name: avail_alerts
    interval: 30s
    rules:
      - alert: AvailNodeDown
        expr: up{job=~"avail.*"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Avail node {{ $labels.instance }} is down"

      - alert: AvailLowPeerCount
        expr: substrate_peers_count < 5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low peer count on {{ $labels.instance }}"

      - alert: AvailBlockProductionStalled
        expr: increase(substrate_block_height[5m]) == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Block production stalled on {{ $labels.instance }}"

Best Practices

Regular monitoring - Set up automated alerting for critical metrics
Historical data - Retain metrics for at least 30 days for trend analysis
Baseline metrics - Establish normal operating ranges for your hardware
Secure access - Use authentication and TLS for external metric endpoints
Log rotation - Configure log rotation to prevent disk space issues

Getting Started

Running a Node

Architecture

Pallets

Operations

Development

Monitoring & Metrics

Overview

Prometheus Endpoint

Default Configuration

Configuring the Prometheus Exporter

Available Metrics

Avail-Specific Metrics

Import Block Metrics

Header Extension Builder Metrics

Kate RPC Metrics

Standard Substrate Metrics

Setting Up Prometheus Server

Basic Prometheus Configuration

Multi-Node Setup

Starting Prometheus

Log Collection with Promtail

Promtail Configuration

Grafana Dashboard Setup

Adding Prometheus Data Source

Key Metrics to Monitor

Validator Health

Network Health

System Resources

Telemetry

Enable Telemetry

Disable Telemetry

Health Checks

RPC Health Endpoint

System Monitoring Script

Storage Monitoring

Enable Storage Monitoring

Alerting Rules

Best Practices

Build docs developers (and LLMs) love

Getting Started

Running a Node

Architecture

Pallets

Operations

Development

​Overview

​Prometheus Endpoint

​Default Configuration

​Configuring the Prometheus Exporter

​Available Metrics

​Avail-Specific Metrics

​Import Block Metrics

​Header Extension Builder Metrics

​Kate RPC Metrics

​Standard Substrate Metrics

​Setting Up Prometheus Server

​Basic Prometheus Configuration

​Multi-Node Setup

​Starting Prometheus

​Log Collection with Promtail

​Promtail Configuration

​Grafana Dashboard Setup

​Adding Prometheus Data Source

​Key Metrics to Monitor

​Validator Health

​Network Health

​System Resources

​Telemetry

​Enable Telemetry

​Disable Telemetry

​Health Checks

​RPC Health Endpoint

​System Monitoring Script

​Storage Monitoring

​Enable Storage Monitoring

​Alerting Rules

​Best Practices

Build docs developers (and LLMs) love

Overview

Prometheus Endpoint

Default Configuration

Configuring the Prometheus Exporter

Available Metrics

Avail-Specific Metrics

Import Block Metrics

Header Extension Builder Metrics

Kate RPC Metrics

Standard Substrate Metrics

Setting Up Prometheus Server

Basic Prometheus Configuration

Multi-Node Setup

Starting Prometheus

Log Collection with Promtail

Promtail Configuration

Grafana Dashboard Setup

Adding Prometheus Data Source

Key Metrics to Monitor

Validator Health

Network Health

System Resources

Telemetry

Enable Telemetry

Disable Telemetry

Health Checks

RPC Health Endpoint

System Monitoring Script

Storage Monitoring

Enable Storage Monitoring

Alerting Rules

Best Practices