Skip to main content

Overview

The microservices-app includes a full observability stack for monitoring, tracing, and logging:
  • OpenTelemetry (OTEL): Unified telemetry collection
  • Prometheus: Metrics storage and querying
  • Grafana: Visualization and dashboards
  • Loki: Log aggregation
  • Tempo: Distributed tracing
  • Hubble UI: Network flow visualization (Cilium)

Architecture

Services (Go/Node.js)
  ├─→ Traces → OTEL Collector → Tempo
  ├─→ Metrics → OTEL Collector → Prometheus
  └─→ Logs → stdout → Loki

Traefik → Traces → OTEL Collector → Tempo

Grafana
  ├─→ Queries Prometheus (metrics)
  ├─→ Queries Loki (logs)
  └─→ Queries Tempo (traces)

OpenTelemetry Integration

All services send telemetry to the OTEL collector via the OTEL_EXPORTER_OTLP_ENDPOINT environment variable.

Configuration

Docker Compose (docker-compose.yml):
greeter:
  environment:
    OTEL_EXPORTER_OTLP_ENDPOINT: ""
    OTEL_SERVICE_NAME: "greeter-service"
Note: OTEL is disabled in Docker Compose by default (empty endpoint). To enable, set to http://otel-collector:4317 if you deploy the observability stack. Kubernetes (deploy/k8s/greeter.nix):
env = {
  OTEL_EXPORTER_OTLP_ENDPOINT.value = "http://otel-collector.observability:4317";
  OTEL_SERVICE_NAME.value = "greeter-service";
};
The OTEL collector runs in the observability namespace and receives:
  • Traces on port 4317 (gRPC)
  • Metrics on port 4318 (HTTP)

Service Instrumentation

Go services use the official OTEL Go SDK with automatic instrumentation:
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/trace"
)

// Initialize tracer provider
func initTracer(ctx context.Context) (*trace.TracerProvider, error) {
    exporter, err := otlptracegrpc.New(ctx)
    if err != nil {
        return nil, err
    }
    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.Default()),
    )
    otel.SetTracerProvider(tp)
    return tp, nil
}
Node.js services use @opentelemetry/sdk-node:
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
  }),
  serviceName: process.env.OTEL_SERVICE_NAME,
});

sdk.start();

Accessing Observability UIs

ServiceURLCredentials
Grafanahttp://localhost:30300admin / admin
Prometheushttp://localhost:30090None
Hubble UIhttp://localhost:31235None
Note: These URLs are only available when using Kind + Tilt with full-bootstrap.

Grafana

Grafana provides unified visualization for metrics, logs, and traces.

Data Sources

Pre-configured in the microservices-infra bootstrap:
  • Prometheus: http://prometheus.observability:9090
  • Loki: http://loki.observability:3100
  • Tempo: http://tempo.observability:3100

Dashboards

Import pre-built dashboards for:
  • Kubernetes cluster metrics
  • Service-level metrics (request rate, latency, error rate)
  • Traefik routing metrics
  • Go runtime metrics

Querying Logs

Loki LogQL examples:
# All logs from greeter-service
{app="greeter-service"}

# Errors only
{app="greeter-service"} |= "error"

# Structured JSON logs
{app="greeter-service"} | json | level="error"

Querying Traces

Use the Explore tab to query Tempo:
  • Search by trace ID
  • Search by service name
  • Search by duration
  • Correlate with logs (click trace → view logs)

Prometheus

Prometheus scrapes metrics from:
  • Kubernetes nodes
  • Traefik
  • OTEL collector
  • Service exporters

Querying Metrics

Example PromQL queries:
# Request rate per service
rate(http_requests_total[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

Service Discovery

Prometheus discovers targets via Kubernetes service discovery:
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

Loki

Loki aggregates logs from all pods via Promtail.

Log Collection

Promtail runs as a DaemonSet and scrapes logs from:
  • Container stdout/stderr
  • Kubernetes events
  • Traefik access logs

Labels

Logs are indexed by:
  • namespace
  • pod
  • container
  • app (from app.kubernetes.io/name label)

Retention

Default retention: 7 days (configured in microservices-infra).

Tempo

Tempo stores distributed traces from OTEL.

Trace Correlation

Logs → Traces: If your log entries include a trace_id field, Grafana can automatically link to the trace:
{"level":"info","msg":"request processed","trace_id":"abc123"}
Traces → Logs: Click a span in Tempo and select “View Logs” to see correlated log entries.

Trace Sampling

Default: 100% sampling (all traces). Adjust in OTEL collector config for production:
processors:
  probabilistic_sampler:
    sampling_percentage: 10  # 10% sampling

Hubble UI

Hubble visualizes network flows between services using Cilium.

Features

  • Service Map: Visual graph of service dependencies
  • Flow Logs: Detailed network flow data (source, destination, protocol, status)
  • DNS Queries: Track DNS lookups
  • Policy Enforcement: Visualize NetworkPolicy effects

Accessing Hubble

Browser: http://localhost:31235 CLI:
hubble observe --namespace microservices

Use Cases

  • Identify service dependencies
  • Debug connectivity issues
  • Verify NetworkPolicy enforcement
  • Analyze traffic patterns

Traefik Observability

Traefik sends traces to Tempo via OTEL:
tracing.otlp.grpc = {
  enabled = true;
  endpoint = "otel-collector.observability:4317";
  insecure = true;
};
This provides:
  • Request traces from edge to backend
  • Middleware timing (CORS, auth, rate limiting)
  • Error tracking

Service Metrics

Go Services

Use promhttp to expose metrics:
import "github.com/prometheus/client_golang/prometheus/promhttp"

http.Handle("/metrics", promhttp.Handler())
Add Prometheus scrape annotation:
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

Node.js Services

Use prom-client:
const client = require('prom-client');
const register = new client.Registry();

client.collectDefaultMetrics({ register });

app.get('/metrics', (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(register.metrics());
});

Custom Instrumentation

Adding Spans

Go:
ctx, span := tracer.Start(ctx, "operation-name")
defer span.End()

// Add attributes
span.SetAttributes(
    attribute.String("user.id", userID),
    attribute.Int("item.count", count),
)
Node.js:
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('my-service');

const span = tracer.startSpan('operation-name');
span.setAttribute('user.id', userId);
span.end();

Structured Logging

Use structured logs for better Loki parsing: Go (zerolog):
log.Info().
    Str("trace_id", traceID).
    Str("user_id", userID).
    Msg("request processed")
Node.js (pino):
logger.info({ traceId, userId }, 'request processed');

Disabling Observability

For lightweight local development: Docker Compose: Already disabled (empty OTEL_EXPORTER_OTLP_ENDPOINT) Kubernetes: Use bootstrap instead of full-bootstrap

Troubleshooting

No Traces Appearing

  1. Check OTEL collector logs:
    kubectl logs -n observability deployment/otel-collector
    
  2. Verify endpoint configuration:
    kubectl get pods -n microservices -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="OTEL_EXPORTER_OTLP_ENDPOINT")].value}'
    
  3. Test connectivity:
    kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
      curl -v http://otel-collector.observability:4317
    

High Memory Usage

Reduce retention periods in microservices-infra:
  • Prometheus: Default 15d → 7d
  • Loki: Default 7d → 3d
  • Tempo: Default 7d → 1d

Missing Metrics

Verify Prometheus scrape targets:
# Port-forward Prometheus
kubectl port-forward -n observability svc/prometheus 9090:9090

# Open http://localhost:9090/targets

Next Steps

Build docs developers (and LLMs) love