Skip to main content
S2 Lite provides built-in observability features including Prometheus metrics, structured logging, and health endpoints.

Health Checks

S2 Lite exposes a /health endpoint for readiness and liveness checks.

Health Endpoint

curl http://localhost:8080/health
Responses:
  • 200 OK with body "OK" - Server is healthy and database is accessible
  • 503 Service Unavailable - Database status check failed

Configuration

healthcheck:
  test: ["CMD", "wget", "-q", "--spider", "http://localhost:80/health"]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 10s
The startup probe allows up to 10 minutes for initialization, which is important when using object storage with high latency or large datasets.

Prometheus Metrics

S2 Lite exposes Prometheus metrics at /metrics in text format.

Metrics Endpoint

curl http://localhost:8080/metrics

Available Metrics

Append Metrics

s2_append_permit_latency_seconds
  • Type: Histogram
  • Description: Time waiting for append permit (backpressure indicator)
  • Buckets: 0.005, 0.010, 0.025, 0.050, 0.100, 0.250, 0.500, 1.000, 2.500 seconds
s2_append_ack_latency_seconds
  • Type: Histogram
  • Description: Time from append request to acknowledgment
  • Buckets: 0.005, 0.010, 0.025, 0.050, 0.100, 0.250, 0.500, 1.000, 2.500 seconds
s2_append_batch_records
  • Type: Histogram
  • Description: Number of records per append batch
  • Buckets: 1, 10, 50, 100, 250, 500, 1000 records
s2_append_batch_bytes
  • Type: Histogram
  • Description: Size in bytes of append batches
  • Buckets: 512, 1024, 4096, 16384, 65536, 262144, 524288, 1048576 bytes

Process Metrics

Standard Prometheus process metrics are automatically included:
  • process_cpu_seconds_total - CPU time
  • process_resident_memory_bytes - Resident memory
  • process_virtual_memory_bytes - Virtual memory
  • process_open_fds - Open file descriptors
  • process_max_fds - Maximum file descriptors

Scraping Configuration

scrape_configs:
  - job_name: 's2-lite'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: /metrics
    scrape_interval: 30s
    scrape_timeout: 10s

Helm Chart Integration

The S2 Lite Helm chart supports automatic ServiceMonitor creation:
values.yaml
metrics:
  serviceMonitor:
    enabled: true
    interval: 30s
    scrapeTimeout: 10s
    labels:
      release: prometheus  # Match your Prometheus operator label
For TLS-enabled deployments:
values.yaml
metrics:
  serviceMonitor:
    enabled: true
    tlsConfig:
      # For self-signed certificates
      insecureSkipVerify: true
      # Or for CA-signed certificates
      # ca:
      #   secret:
      #     name: s2-lite-tls
      #     key: tls.crt

Logging

S2 Lite uses structured logging with configurable levels.

Log Levels

Configure via the RUST_LOG environment variable:
export RUST_LOG=info
s2 lite --port 8080

Log Format

Logs are output in a structured format:
2024-03-03T12:00:00.123456Z  INFO s2_lite::server: using s3 object store bucket="my-bucket"
2024-03-03T12:00:00.234567Z  INFO s2_lite::server: pipelining enabled on append sessions up to 25MiB
2024-03-03T12:00:00.345678Z  INFO s2_lite::server: starting plain http server addr="0.0.0.0:8080"

Docker Logging

View logs:
docker logs -f s2-lite
With timestamp:
docker logs -f --timestamps s2-lite

Kubernetes Logging

View logs:
kubectl logs -l app.kubernetes.io/name=s2-lite --follow
Stream from multiple pods:
kubectl logs -l app.kubernetes.io/name=s2-lite --follow --all-containers

Systemd Logging

View logs:
sudo journalctl -u s2-lite -f
With filters:
# Last hour
sudo journalctl -u s2-lite --since "1 hour ago"

# Errors only
sudo journalctl -u s2-lite -p err

Grafana Dashboards

Example Dashboard

Here’s a basic Grafana dashboard configuration for S2 Lite:
s2-lite-dashboard.json
{
  "dashboard": {
    "title": "S2 Lite Metrics",
    "panels": [
      {
        "title": "Append Latency (p95)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(s2_append_ack_latency_seconds_bucket[5m]))"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Append Rate",
        "targets": [
          {
            "expr": "rate(s2_append_batch_records_count[5m])"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Append Throughput (bytes/sec)",
        "targets": [
          {
            "expr": "rate(s2_append_batch_bytes_sum[5m])"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Memory Usage",
        "targets": [
          {
            "expr": "process_resident_memory_bytes"
          }
        ],
        "type": "graph"
      }
    ]
  }
}

Key Queries

Append latency percentiles:
# p50
histogram_quantile(0.50, rate(s2_append_ack_latency_seconds_bucket[5m]))

# p95
histogram_quantile(0.95, rate(s2_append_ack_latency_seconds_bucket[5m]))

# p99
histogram_quantile(0.99, rate(s2_append_ack_latency_seconds_bucket[5m]))
Append throughput:
# Records per second
rate(s2_append_batch_records_count[5m])

# Bytes per second
rate(s2_append_batch_bytes_sum[5m])

# Average batch size
rate(s2_append_batch_records_sum[5m]) / rate(s2_append_batch_records_count[5m])
Backpressure indicator:
# High permit latency indicates backpressure
histogram_quantile(0.95, rate(s2_append_permit_latency_seconds_bucket[5m]))

Alerting

Prometheus Alert Rules

alerts.yml
groups:
  - name: s2_lite
    interval: 30s
    rules:
      - alert: S2LiteDown
        expr: up{job="s2-lite"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "S2 Lite instance is down"
          description: "S2 Lite instance {{ $labels.instance }} has been down for more than 1 minute."

      - alert: S2LiteHighAppendLatency
        expr: histogram_quantile(0.95, rate(s2_append_ack_latency_seconds_bucket[5m])) > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "S2 Lite high append latency"
          description: "S2 Lite p95 append latency is {{ $value }}s on {{ $labels.instance }}."

      - alert: S2LiteHighBackpressure
        expr: histogram_quantile(0.95, rate(s2_append_permit_latency_seconds_bucket[5m])) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "S2 Lite experiencing backpressure"
          description: "S2 Lite permit latency is {{ $value }}s, indicating backpressure."

      - alert: S2LiteHighMemory
        expr: process_resident_memory_bytes{job="s2-lite"} > 2e9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "S2 Lite high memory usage"
          description: "S2 Lite is using {{ $value | humanize }}B of memory."

Health Check Monitoring

Monitor the health endpoint with your monitoring system:
docker-compose.yml
services:
  s2-lite:
    # ... other config ...
    labels:
      - "com.datadoghq.ad.check_names=[\"http_check\"]"
      - "com.datadoghq.ad.init_configs=[{}]"
      - "com.datadoghq.ad.instances=[{\"name\":\"s2-lite\",\"url\":\"http://%%host%%:80/health\",\"timeout\":5}]"

Performance Monitoring

Key Performance Indicators

  1. Append Latency: Time to acknowledge writes
  2. Permit Latency: Backpressure / queueing time
  3. Throughput: Records and bytes per second
  4. Memory Usage: Track for memory leaks
  5. CPU Usage: Detect resource constraints

Benchmarking

Use the built-in benchmark tool:
# Create basin
s2 create-basin benchmark --create-stream-on-append

# Run benchmark
s2 bench benchmark \
  --target-mibps 10 \
  --duration 30s \
  --catchup-delay 0s
Monitor metrics during the benchmark to establish baselines.

Tracing

S2 Lite includes HTTP request tracing via tower-http:
  • Request/response logging at INFO level
  • Detailed request info at DEBUG level
  • Trace IDs in structured logs
Distributed tracing (OpenTelemetry) is not currently supported but is planned for a future release.

Next Steps

Configuration

Configure S2 Lite settings

Deployment

Deploy to production

Build docs developers (and LLMs) love