Skip to main content

Monitoring & Observability

Comprehensive monitoring solutions for tracking service health, performance metrics, logs, and uptime.

Available Services

Grafana

Port: 3150 | Memory: 256 MB | Maturity: StableOpen-source analytics and interactive visualization platform for metrics, logs, and traces with rich dashboards and alerting.Features:
  • Beautiful dashboards
  • Multiple data sources
  • Alert management
  • Template variables
  • Plugin ecosystem
  • Team collaboration
OpenClaw Integration:
  • Environment: GRAFANA_HOST, GRAFANA_PORT
Requires: PrometheusDocumentation

Prometheus

Port: 9090 | Memory: 256 MB | Maturity: StableOpen-source systems monitoring and alerting toolkit for collecting and querying time-series metrics with PromQL.Features:
  • Time-series database
  • Powerful query language (PromQL)
  • Service discovery
  • Alertmanager integration
  • Pull-based metrics
  • Exporters ecosystem
Recommends: GrafanaDocumentation

Uptime Kuma

Port: 3001 | Memory: 256 MB | Maturity: StableSelf-hosted monitoring tool for tracking uptime of websites, APIs, and services with a sleek dashboard and multi-notification support.Features:
  • HTTP/HTTPS monitoring
  • TCP/Ping checks
  • Status pages
  • Multi-language support
  • 90+ notification channels
  • Certificate monitoring
Documentation

Loki

Port: 3100 | Memory: 512 MB | Maturity: StableLike Prometheus, but for logs. A highly available, multi-tenant log aggregation system.Features:
  • Log aggregation
  • Label-based indexing
  • LogQL query language
  • Grafana integration
  • Multi-tenancy
  • Cost-effective storage
Recommends: Grafana, PrometheusDocumentation

SigNoz

Port: 3301 | Memory: 1024 MB | Maturity: StableOpen-source observability platform serving as a lighter alternative to DataDog. All-in-one metrics, traces, and logs.Features:
  • Metrics, traces, logs
  • APM capabilities
  • Service maps
  • OpenTelemetry native
  • Query builder
  • Alerting
Documentation

Usage Examples

DevOps Monitoring Stack

npx create-better-openclaw --preset devops --yes
This includes: n8n, PostgreSQL, Redis, Uptime Kuma, Grafana, Prometheus

Full Observability Stack

npx create-better-openclaw \
  --services grafana,prometheus,loki,uptime-kuma \
  --yes

Lightweight Monitoring

npx create-better-openclaw \
  --services uptime-kuma,beszel \
  --yes

Complete Observability Platform

npx create-better-openclaw \
  --services signoz,grafana,prometheus \
  --yes

Monitoring Stack Comparison

ServiceMetricsLogsTracesAlertsMemory
Grafana✅ (via datasources)256 MB
Prometheus256 MB
Uptime Kuma✅ (uptime)256 MB
Loki512 MB
SigNoz1024 MB

Architecture Patterns

Classic Stack (Grafana + Prometheus + Loki)

Services → Prometheus (metrics) ┐
Services → Loki (logs)          ├→ Grafana (visualization)
Services → Traces               ┘

All-in-One (SigNoz)

Services → SigNoz (metrics, logs, traces, visualization)

Uptime Monitoring

Websites/APIs → Uptime Kuma (monitoring + status page)

Grafana Configuration

Add Prometheus Data Source

  1. Navigate to Configuration → Data Sources
  2. Add Prometheus
  3. URL: http://prometheus:9090
  4. Save & Test

Import Dashboards

# Popular dashboards:
# - Node Exporter Full: 1860
# - Docker Container Monitoring: 193
# - Redis Dashboard: 11835
# - PostgreSQL Database: 9628

Create Dashboard

  1. Create → Dashboard
  2. Add Panel
  3. Write PromQL query
  4. Configure visualization
  5. Save dashboard

Prometheus Configuration

Basic prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
  
  - job_name: 'redis'
    static_configs:
      - targets: ['redis:6379']

Common PromQL Queries

# CPU usage
rate(cpu_usage_seconds_total[5m])

# Memory usage
mem_usage_bytes / mem_total_bytes

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Uptime Kuma Configuration

Add Monitors

  1. Add New Monitor
  2. Choose monitor type:
    • HTTP(s)
    • TCP Port
    • Ping
    • DNS
    • Docker Container
  3. Configure settings
  4. Set notification channels

Notification Channels

  • Email (SMTP)
  • Slack
  • Discord
  • Telegram
  • Webhooks
  • PagerDuty
  • 90+ other services

Status Pages

  1. Status Pages → Add Status Page
  2. Select monitors to include
  3. Customize appearance
  4. Share public URL

Loki Configuration

Send Logs to Loki

# promtail-config.yml
server:
  http_listen_port: 9080

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: containers
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/*/*-json.log

LogQL Queries

# All logs from a container
{container="postgresql"}

# Filter by level
{container="postgresql"} |= "ERROR"

# Count errors per minute
sum(rate({container="postgresql"} |= "ERROR" [1m]))

# Pattern extraction
{container="nginx"} | pattern `<_> <status> <_>`

SigNoz Features

Application Monitoring

  • Service latency
  • Request rate
  • Error rate
  • Service map
  • Database calls

Infrastructure Monitoring

  • CPU, memory, disk
  • Network metrics
  • Container stats
  • Host metrics

Log Management

  • Structured logs
  • Full-text search
  • Log aggregation
  • Correlation with traces

Alert Configuration

Grafana Alerts

  1. Edit Panel → Alert
  2. Define condition
  3. Set evaluation interval
  4. Configure notification channel
  5. Save

Prometheus Alerts

# alert.rules.yml
groups:
  - name: example
    rules:
      - alert: HighCPUUsage
        expr: cpu_usage > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: High CPU usage detected

Uptime Kuma Alerts

  1. Configure in monitor settings
  2. Set notification channels
  3. Define alert conditions:
    • Down
    • Slow response time
    • Certificate expiring

Best Practices

Metrics Collection

  1. Naming: Use consistent metric naming conventions
  2. Labels: Add relevant labels for filtering
  3. Cardinality: Avoid high-cardinality labels
  4. Retention: Configure appropriate retention periods
  5. Sampling: Use appropriate scrape intervals

Dashboard Design

  1. Hierarchy: Organize dashboards by service/team
  2. Variables: Use template variables for flexibility
  3. Annotations: Mark deployments and incidents
  4. Performance: Limit number of panels per dashboard
  5. Documentation: Add panel descriptions

Alerting

  1. Thresholds: Set realistic alert thresholds
  2. Priorities: Classify alerts by severity
  3. Routing: Route alerts to appropriate teams
  4. Notification Fatigue: Avoid too many alerts
  5. Runbooks: Link to runbooks in alerts

Integration Examples

Full Stack Monitoring

npx create-better-openclaw \
  --services grafana,prometheus,loki,uptime-kuma,postgresql,redis,n8n \
  --yes

Cloud-Native Observability

npx create-better-openclaw \
  --services signoz,postgresql,redis \
  --yes

Minimal Monitoring

npx create-better-openclaw \
  --services uptime-kuma,beszel \
  --yes

Metrics to Monitor

System Metrics

  • CPU usage
  • Memory usage
  • Disk I/O
  • Network traffic
  • Open file descriptors

Application Metrics

  • Request rate
  • Response time
  • Error rate
  • Active connections
  • Queue depth

Database Metrics

  • Query latency
  • Connection pool
  • Cache hit rate
  • Replication lag
  • Lock contention

Container Metrics

  • Container count
  • Resource limits
  • Restart count
  • Image pull time
  • Volume usage

Build docs developers (and LLMs) love