Skip to main content
IOTA nodes expose comprehensive metrics for monitoring performance, health, and network participation. This guide covers setting up monitoring infrastructure and understanding key metrics.

Metrics Endpoint

Nodes expose Prometheus-compatible metrics on the configured metrics address:
# In node.yaml
metrics-address: "0.0.0.0:9184"
Access metrics at: http://your-node:9184/metrics

Quick Health Check

Verify your node is exposing metrics:
curl http://localhost:9184/metrics
You should see Prometheus-formatted metrics output.

Prometheus Setup

1

Install Prometheus

# Using Docker
docker pull prom/prometheus:latest

# Or download binary from prometheus.io
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
2

Configure Prometheus

Create prometheus.yml:
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'iota-node'
    static_configs:
      - targets: ['localhost:9184']
        labels:
          instance: 'node-1'
          chain: 'mainnet'
3

Start Prometheus

docker run -d \
  --name prometheus \
  -p 9090:9090 \
  -v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus:latest
4

Verify data collection

Access Prometheus UI at http://localhost:9090 and query:
uptime

Metrics Push Service

For centralized monitoring, configure metrics push:
metrics:
  # Push interval in seconds (default: 60)
  push-interval-seconds: 60
  
  # Remote push endpoint
  push-url: "https://metrics-gateway.example.com/push"
The node will periodically push metrics to the configured endpoint using authenticated requests.

Key Metrics Reference

Node Uptime and Version

# Node uptime in seconds
uptime

# Labels include:
# - process: "validator" or "fullnode"
# - version: binary version
# - chain_identifier: network identifier
# - os_version: operating system
# - is_docker: whether running in Docker

Task and Future Monitoring

# Number of running tasks
monitored_tasks{callsite="..."}

# Number of pending futures
monitored_futures{callsite="..."}

# Active duration of futures in nanoseconds
monitored_future_active_duration_ns{name="..."}

Channel Metrics

# Items in flight in channels
monitored_channel_inflight{name="..."}

# Items sent through channels
monitored_channel_sent{name="..."}

# Items received from channels
monitored_channel_received{name="..."}

Scope Monitoring

# Number of scope entrances
monitored_scope_entrance{name="..."}

# Total scope iterations
monitored_scope_iterations{name="..."}

# Scope duration in nanoseconds
monitored_scope_duration_ns{name="..."}

Thread Stall Detection

# Thread stall duration histogram
thread_stall_duration_sec_bucket
thread_stall_duration_sec_sum
thread_stall_duration_sec_count

System Invariant Violations

# Count of system invariant violations
system_invariant_violations{name="..."}
Any non-zero value for system_invariant_violations indicates a serious issue that requires immediate investigation.

gRPC API Metrics

# In-flight gRPC requests
inflight_grpc{path="..."}

# Total gRPC requests
grpc_requests{path="...", status="..."}

# gRPC request latency histogram
grpc_request_latency_bucket{path="..."}
grpc_request_latency_sum{path="..."}
grpc_request_latency_count{path="..."}

zkLogin Metrics

# JWK requests by provider
jwk_requests{provider="..."}

# JWK request errors
jwk_request_errors{provider="..."}

# Total JWKs
total_jwks{provider="..."}

# Invalid JWKs
invalid_jwks{provider="..."}

# Unique JWKs
unique_jwks{provider="..."}

Hardware Metrics

The iota-metrics crate includes hardware monitoring capabilities:
# CPU usage, memory, disk I/O
# Network interface statistics
# System load averages
These metrics are automatically collected when enabled in the node configuration.

Grafana Dashboard

1

Install Grafana

docker run -d \
  --name grafana \
  -p 3000:3000 \
  grafana/grafana:latest
2

Add Prometheus data source

  1. Access Grafana at http://localhost:3000 (default credentials: admin/admin)
  2. Go to Configuration > Data Sources
  3. Add Prometheus with URL http://prometheus:9090
3

Create dashboards

Create custom dashboards tracking:
  • Node uptime and version
  • gRPC request rates and latency
  • Channel queue depths
  • Thread stall events
  • System resource utilization

Example Prometheus Queries

Request Rate (QPS)

# gRPC requests per second by path
rate(grpc_requests[5m])

Error Rate

# gRPC error rate by status code
rate(grpc_requests{status!="Ok"}[5m])

Latency Percentiles

# 95th percentile gRPC latency
histogram_quantile(0.95, 
  rate(grpc_request_latency_bucket[5m])
)

# 99th percentile
histogram_quantile(0.99,
  rate(grpc_request_latency_bucket[5m])
)

Channel Backlog

# Items waiting in channels
monitored_channel_inflight

# Channel throughput
rate(monitored_channel_sent[5m])

Thread Health

# Thread stalls per minute
rate(thread_stall_duration_sec_count[1m]) * 60

# Average stall duration
rate(thread_stall_duration_sec_sum[5m]) / 
rate(thread_stall_duration_sec_count[5m])

Alerting Rules

Create Prometheus alerting rules for critical conditions:
groups:
  - name: iota_node_alerts
    interval: 30s
    rules:
      - alert: NodeDown
        expr: up{job="iota-node"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "IOTA node is down"
      
      - alert: HighGrpcErrorRate
        expr: |
          rate(grpc_requests{status!="Ok"}[5m]) / 
          rate(grpc_requests[5m]) > 0.05
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High gRPC error rate detected"
      
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95,
            rate(grpc_request_latency_bucket[5m])
          ) > 5
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "High request latency (p95 > 5s)"
      
      - alert: ThreadStalls
        expr: |
          rate(thread_stall_duration_sec_count[5m]) > 0.1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Frequent thread stalls detected"
      
      - alert: InvariantViolation
        expr: |
          increase(system_invariant_violations[5m]) > 0
        labels:
          severity: critical
        annotations:
          summary: "System invariant violation detected"

Admin Interface

The admin interface provides runtime control and diagnostics:
# Default: 127.0.0.1:1337 (localhost only)
admin-interface-address: "127.0.0.1:1337"
The admin interface should only be accessible from localhost or via secure channels. Never expose it publicly.

Logging Configuration

The admin interface allows dynamic tracing and logging configuration:
# Access admin interface
curl http://localhost:1337/admin/logging

Network Metrics

Monitor P2P network health:
# Network message metrics
# Peer connection counts
# State sync progress
# Checkpoint download rates
These metrics are exposed through the P2P subsystem’s metrics integration.

Best Practices

  1. Set up alerting: Don’t rely on manual monitoring - configure alerts for critical conditions
  2. Monitor trends: Track metrics over time to identify degradation before it becomes critical
  3. Correlate metrics: Look at multiple metrics together to diagnose issues
  4. Regular review: Periodically review dashboards and adjust thresholds
  5. Retention: Configure appropriate metrics retention based on your needs
  6. Security: Protect metrics endpoints with authentication in production

Next Steps

Build docs developers (and LLMs) love