Monitoring

Tessellation exposes Prometheus-compatible metrics via Micrometer, ships structured JSON logs via Logback, and provides a pre-configured observability stack (Prometheus + Grafana + Loki) for Kubernetes deployments.

Metrics: Micrometer + Prometheus

Every node exposes a Prometheus scrape endpoint:

GET /metrics

This endpoint is served on the public HTTP port (default: 9000). Prometheus is configured to discover scrape targets dynamically via the initial validator’s /targets API.

# kubernetes/prometheus/prometheus.yaml
global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
scrape_configs:
  - job_name: prometheus
    metrics_path: /metrics
    static_configs:
      - targets:
          - localhost:9090
  - job_name: dynamic-targets
    http_sd_configs:
      - url: http://l0-initial-validator:9000/targets
      - url: http://l1-initial-validator:9000/targets

The /targets endpoint on each initial validator returns the list of all peer nodes in the cluster as Prometheus service discovery targets. This means Prometheus automatically picks up new peers as they join.

Key metrics to monitor

Metric category	What to watch
Consensus rounds	Round duration, stall frequency, phase transition latency
Gossip	Rumor propagation rate, peer rumor lag
Cluster	Cluster size (number of active peers), join/leave events
JVM	Heap usage, GC pause duration, thread count
HTTP	Request latency (p99), error rates per endpoint

Metrics are instrumented with Micrometer and follow JVM Micrometer naming conventions (e.g., jvm_memory_used_bytes, http_server_requests_seconds).

Prometheus deployment

The Prometheus deployment in kubernetes/prometheus/ runs prom/prometheus:v2.36.1 with 24-hour TSDB retention:

# kubernetes/prometheus/prometheus-deployment.yaml (excerpt)
containers:
  - name: prometheus
    image: prom/prometheus:v2.36.1
    args:
      - "--storage.tsdb.retention.time=24h"
      - "--config.file=/etc/prometheus/prometheus.yaml"
      - "--storage.tsdb.path=/prometheus/"
    ports:
      - containerPort: 9090
        name: http

Deploy with Kustomize:

kubectl apply -k kubernetes/prometheus/

Grafana dashboards

Grafana 9.1.6 is deployed with anonymous admin access and two pre-provisioned dashboards:

Dashboard	File	Content
Tessellation	`dashboards/tessellation.json`	Node-specific metrics: consensus, gossip, cluster
JVM Micrometer	`dashboards/jvm-micrometer_rev9.json`	JVM health: heap, GC, threads, CPU

Datasources are provisioned automatically:

# kubernetes/grafana/datasources/datasource.yaml
datasources:
  - name: prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true
  - name: loki
    type: loki
    url: http://loki:3100
    jsonData:
      maxLines: 1000

Deploy Grafana:

kubectl apply -k kubernetes/grafana/

Grafana is available at port 3000. The readiness probe checks GET /api/health.

The natel-discrete-panel plugin is pre-installed in the Grafana deployment for discrete/state-timeline visualizations of consensus round phases.

Loki log aggregation

Loki aggregates structured JSON logs from all validator pods. A Promtail sidecar container runs in each validator pod and ships logs to Loki.

Log format

Tessellation uses Logback with the Logstash JSON encoder. Each log line is a JSON object written to /tessellation/logs/json_logs/*.json.log. Promtail parses these fields:

Field	Description
`@timestamp`	ISO 8601 timestamp (RFC3339Nano)
`message`	Log message body
`level`	Log level (`INFO`, `WARN`, `ERROR`, etc.)
`logger_name`	Fully-qualified logger name
`ip`	Node IP address
`application`	Application/module name
`peer_id_short`	Abbreviated peer node ID

Promtail configuration

# kubernetes/base/promtail/config.yaml
client:
  url: "http://loki:3100/loki/api/v1/push"

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          __path__: /var/log/app/json_logs/*.json.log
    pipeline_stages:
      - json:
          expressions:
            timestamp: '"@timestamp"'
            message: message
            level: level
            logger_name: logger_name
            ip: ip
            application: application
            peer_id_short: peer_id_short
      - timestamp:
          source: timestamp
          format: RFC3339Nano
      - labels:
          level:
          ip:
          application:
          peer_id_short:

Loki configuration

Loki runs in single-binary mode with filesystem storage:

# kubernetes/loki/config.yaml
server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

Deploy Loki:

kubectl apply -k kubernetes/loki/

Querying logs in Grafana

With the Loki datasource configured, use LogQL to query validator logs:

# All ERROR logs from the l0 application
{application="dag-l0", level="ERROR"}

# Consensus-related logs for a specific peer
{application="dag-l0", peer_id_short="abc123"} |= "consensus"

# Log rate by level
sum by (level) (rate({application="dag-l0"}[1m]))

Logs from Docker-based deployments use a different log path (/tessellation/logs/) rather than the Kubernetes Promtail path (/var/log/app/). For Docker, check logs directly with docker logs <container-name> or mount a host volume to access log files.

Get Started

Core Concepts

Metagraph SDK

Node Operations

Development

Metrics: Micrometer + Prometheus

Key metrics to monitor

Prometheus deployment

Grafana dashboards

Loki log aggregation

Log format

Promtail configuration

Loki configuration

Querying logs in Grafana

Build docs developers (and LLMs) love

Get Started

Core Concepts

Metagraph SDK

Node Operations

Development

​Metrics: Micrometer + Prometheus

​Key metrics to monitor

​Prometheus deployment

​Grafana dashboards

​Loki log aggregation

​Log format

​Promtail configuration

​Loki configuration

​Querying logs in Grafana

Build docs developers (and LLMs) love

Metrics: Micrometer + Prometheus

Key metrics to monitor

Prometheus deployment

Grafana dashboards

Loki log aggregation

Log format

Promtail configuration

Loki configuration

Querying logs in Grafana