Skip to main content
OpenTelemetry (OTel) provides standardized collection, processing, and export of telemetry data (metrics, traces, and logs) for PentAGI. It serves as the central hub for all observability data flowing through the system.

Overview

OpenTelemetry is a vendor-neutral observability framework that provides:
  • Unified Collection: Single endpoint for all telemetry data
  • Data Processing: Transform, filter, and enrich observability data
  • Multiple Exporters: Send data to various backends simultaneously
  • Standards-Based: Industry-standard OTLP protocol support
  • Extensible: Rich ecosystem of receivers, processors, and exporters

Architecture

The OpenTelemetry Collector acts as the central data pipeline:

Setup

1

Configure OpenTelemetry Endpoint

Enable OTel in your .env file:
.env
# OpenTelemetry configuration
OTEL_HOST=otelcol:8148

# OTel Collector ports
OTEL_GRPC_LISTEN_PORT=8148
OTEL_HTTP_LISTEN_PORT=4318
OTEL_GRPC_LISTEN_IP=127.0.0.1
OTEL_HTTP_LISTEN_IP=127.0.0.1
PentAGI will automatically send telemetry to the OTel collector when OTEL_HOST is set.
2

Deploy Observability Stack

The OpenTelemetry Collector is included in the observability stack:
curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-observability.yml
docker compose -f docker-compose.yml -f docker-compose-observability.yml up -d
3

Verify Data Collection

Check that OTel is receiving data:
# View collector logs
docker compose logs -f otel

# Check health endpoint
curl http://localhost:13133/

Configuration

The OTel Collector is configured via /observability/otel/config.yml:

Receivers

Data collection endpoints:
config.yml
receivers:
  # OTLP protocol (from PentAGI)
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:8148
      http:
        endpoint: 0.0.0.0:4318
  
  # Prometheus scraping (system metrics)
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: ['node-exporter:9100']
        - job_name: 'clickhouse-collector'
          static_configs:
            - targets: ['clickstore:9363']
        - job_name: 'jaeger-collector'
          static_configs:
            - targets: ['jaeger:14269', 'jaeger:9090']
  
  # Docker metrics
  prometheus/docker:
    config:
      scrape_configs:
        - job_name: 'docker-container-collector'
          static_configs:
            - targets: ['cadvisor:8080']

Processors

Data transformation and filtering:
config.yml
processors:
  # Batch processing for efficiency
  batch:
    timeout: 5s
    send_batch_size: 1000
  
  # Attribute manipulation
  attributes:
    actions:
      - key: service_name_extracted
        action: delete

Exporters

Data output destinations:
config.yml
exporters:
  # Traces to Jaeger
  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: true
  
  # Logs to Loki
  otlphttp:
    endpoint: http://loki:3100/otlp
  
  # Metrics to VictoriaMetrics
  prometheusremotewrite/local:
    endpoint: http://victoriametrics:8428/api/v1/write

Pipelines

Data flow configuration:
config.yml
service:
  pipelines:
    # Traces pipeline
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    
    # Logs pipeline
    logs:
      receivers: [otlp]
      processors: [attributes, batch]
      exporters: [otlphttp]
    
    # Metrics pipeline
    metrics:
      receivers: [otlp, prometheus, prometheus/docker]
      processors: [batch]
      exporters: [prometheusremotewrite/local]

Telemetry Types

Traces

Distributed tracing data: Source: PentAGI application spans Flow: PentAGI → OTel → Jaeger Usage: Track request flow through system
// Example: PentAGI automatically creates spans
span := tracer.Start(ctx, "agent.execute")
defer span.End()

Metrics

Numerical measurements over time: Sources:
  • PentAGI application metrics (OTLP)
  • Node Exporter (system metrics)
  • cAdvisor (container metrics)
  • Component health checks
Flow: Sources → OTel → VictoriaMetrics Usage: Monitor performance and resource usage

Logs

Structured log events: Source: PentAGI application logs Flow: PentAGI → OTel → Loki Usage: Debug issues and audit operations

Integration

PentAGI Integration

PentAGI automatically sends telemetry when configured:
.env
OTEL_HOST=otelcol:8148
The service will:
  1. Create spans for agent operations
  2. Export application metrics
  3. Send structured logs
  4. Include trace context in all operations

Langfuse Integration

Connect Langfuse to OTel for unified observability:
.env
LANGFUSE_OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4318
LANGFUSE_OTEL_SERVICE_NAME=langfuse
This enables:
  • LLM traces in Jaeger
  • Langfuse metrics in Grafana
  • Unified log aggregation

Monitoring

Collector Health

Built-in health endpoints:
# Check collector status
curl http://localhost:13133/

# View metrics
curl http://localhost:8888/metrics

# Check zpages (detailed internal state)
curl http://localhost:55679/debug/tracez
curl http://localhost:55679/debug/servicez

Performance Metrics

Key metrics to monitor:
MetricDescription
otelcol_receiver_accepted_spansSpans received
otelcol_receiver_refused_spansSpans rejected
otelcol_exporter_sent_spansSpans exported
otelcol_processor_batch_batch_send_sizeBatch sizes
otelcol_processor_batch_timeout_triggerBatch timeouts

Resource Usage

Monitor collector resource consumption:
# Check memory and CPU
docker stats otel

# View detailed metrics
docker exec otel curl localhost:8888/metrics | grep process

Troubleshooting

No Data Flowing

Verify collector is receiving data:
# Check OTLP receivers
docker compose logs otel | grep "Starting OTLP"

# Test gRPC endpoint
grpcurl -plaintext localhost:8148 list

# Test HTTP endpoint
curl -X POST http://localhost:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{}'

Connection Refused

Check network connectivity:
# From PentAGI container
docker exec pentagi ping otelcol
docker exec pentagi telnet otelcol 8148

# Verify networks
docker network inspect observability-network
docker network inspect pentagi-network

High Memory Usage

Optimize collector configuration:
config.yml
processors:
  batch:
    timeout: 1s          # Flush more frequently
    send_batch_size: 100 # Smaller batches
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 512       # Limit memory usage

Export Failures

Debug exporter issues:
# Enable debug logging
docker compose logs otel | grep -i error

# Check exporter connectivity
docker exec otel curl http://victoriametrics:8428/health
docker exec otel curl http://loki:3100/ready

Advanced Configuration

Sampling

Reduce trace volume:
config.yml
processors:
  probabilistic_sampler:
    sampling_percentage: 10.0  # Sample 10% of traces

service:
  pipelines:
    traces:
      processors: [probabilistic_sampler, batch]

Filtering

Drop unwanted data:
config.yml
processors:
  filter:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - ^go_.*  # Exclude Go runtime metrics

Enrichment

Add context to telemetry:
config.yml
processors:
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
      - key: cluster
        value: pentagi-prod-01
        action: insert

Multiple Backends

Export to multiple destinations:
config.yml
exporters:
  otlp/primary:
    endpoint: jaeger:4317
  otlp/backup:
    endpoint: backup-collector:4317

service:
  pipelines:
    traces:
      exporters: [otlp/primary, otlp/backup]

Best Practices

Configuration Management

  • Version control your config.yml
  • Use environment variables for secrets
  • Document custom configuration changes
  • Test changes in development first
  • Keep backups of working configurations

Performance Optimization

  • Enable batching for all pipelines
  • Use appropriate batch sizes (100-1000)
  • Configure memory limiters
  • Monitor collector resource usage
  • Scale horizontally if needed

Security

  • Use TLS for production deployments
  • Restrict network access to OTel ports
  • Sanitize sensitive data in processors
  • Implement authentication on receivers
  • Audit configuration regularly

Reliability

  • Configure retry policies for exporters
  • Use persistent queues for critical data
  • Monitor collector health continuously
  • Set up redundant collectors
  • Test failover scenarios

Build docs developers (and LLMs) love