Skip to main content

Overview

PentAGI provides enterprise-grade observability through OpenTelemetry integration, Langfuse LLM analytics, and comprehensive monitoring dashboards. Track every aspect of your AI-powered penetration testing from agent performance to system metrics.

LLM Observability

Track AI agent performance with Langfuse

System Metrics

Real-time infrastructure monitoring

Distributed Tracing

End-to-end request tracing with Jaeger

Log Aggregation

Centralized logging with Loki

Architecture

Monitoring Stack

PentAGI uses a comprehensive observability stack:

Components

Purpose: Unified telemetry data collection and processingCapabilities:
  • Metrics collection and aggregation
  • Distributed trace processing
  • Log collection and forwarding
  • Data enrichment and filtering
Configuration:
.env
OTEL_HOST=otelcol:8148  # OpenTelemetry collector endpoint
Purpose: LLM observability and performance analyticsCapabilities:
  • Trace LLM API calls
  • Monitor token usage and costs
  • Analyze prompt performance
  • Track agent execution
  • Score generation quality
Configuration:
.env
LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key
Access: http://localhost:4000
Purpose: High-performance time-series metrics storageCapabilities:
  • Long-term metrics retention
  • Efficient storage compression
  • Fast queries and aggregations
  • Prometheus-compatible
Metrics Collected:
  • Request rates and latencies
  • Resource utilization (CPU, memory)
  • Agent execution times
  • Tool invocation counts
  • Error rates
Purpose: Distributed tracing for debuggingCapabilities:
  • End-to-end request tracing
  • Service dependency visualization
  • Performance bottleneck identification
  • Error propagation tracking
Trace Components:
  • Spans for each operation
  • Parent-child relationships
  • Timing information
  • Contextual attributes
Purpose: Scalable log aggregationCapabilities:
  • Centralized log collection
  • Label-based log indexing
  • Efficient log storage
  • Powerful query language (LogQL)
Log Sources:
  • Application logs
  • Agent execution logs
  • System logs
  • Docker container logs
Purpose: Unified monitoring dashboardsCapabilities:
  • Custom dashboard creation
  • Multi-source data visualization
  • Alerting and notifications
  • Correlation of metrics, traces, and logs
Access: http://localhost:3000

LLM Observability

Langfuse Integration

PentAGI automatically tracks all LLM interactions:
type Observer interface {
    NewObservation(
        context.Context, 
        ...langfuse.ObservationContextOption
    ) (context.Context, langfuse.Observation)
}

Observation Types

Different observation types for different components:
// For environment and search tools
toolObservation := observation.Tool(
    langfuse.WithToolName(name),
    langfuse.WithToolInput(args),
    langfuse.WithToolMetadata(metadata),
)

// End with result
toolObservation.End(
    langfuse.WithToolOutput(result),
    langfuse.WithToolStatus("success"),
)

Observation Metadata

Rich metadata for filtering and analysis:
metadata := langfuse.Metadata{
    "tool_name":     name,
    "tool_category": GetToolType(name).String(),
    "flow_id":       flowID,
    "task_id":       taskID,
    "subtask_id":    subtaskID,
    "agent_type":    "pentester",
}

LLM Metrics Tracked

Token Usage

  • Input tokens per request
  • Output tokens per request
  • Total tokens per session
  • Cost estimation

Latency

  • First token latency (TTFT)
  • Total generation time
  • API call duration
  • Queue wait time

Quality Metrics

  • Generation scores
  • Hallucination detection
  • Output relevance
  • Task completion rate

Error Tracking

  • API failures
  • Rate limit errors
  • Timeout occurrences
  • Invalid responses

Langfuse Dashboard Views

  1. Traces: Complete agent execution traces
  2. Sessions: User session analytics
  3. Generations: Individual LLM generations
  4. Scores: Quality scoring and evaluation
  5. Datasets: Prompt testing and validation
  6. Analytics: Aggregate metrics and trends

System Metrics

Metrics Collection

Automatic metrics collection:
type Collector interface {
    StartProcessMetricCollect(attrs ...attribute.KeyValue) error
    StartGoRuntimeMetricCollect(attrs ...attribute.KeyValue) error
    StartDumperMetricCollect(stats Dumper, attrs ...attribute.KeyValue) error
}

// Initialize collectors
Observer.StartProcessMetricCollect()
Observer.StartGoRuntimeMetricCollect()

Process Metrics

OS-level process metrics:
  • CPU Usage: Process CPU utilization percentage
  • Memory Usage: RSS, VMS, heap allocations
  • File Descriptors: Open file descriptor count
  • Threads: Thread count and goroutine count
  • Network: Bytes sent/received

Go Runtime Metrics

Go-specific runtime metrics:
  • Goroutines: Number of active goroutines
  • Memory: Heap usage, GC stats, allocations
  • GC: Garbage collection frequency and pause times
  • Scheduler: Goroutine scheduling latency

Custom Metrics

Create custom metrics for specific monitoring:
// Track event counts
counter, err := Observer.NewInt64Counter(
    "pentagi.agent.executions",
    otelmetric.WithDescription("Total agent executions"),
)

// Increment counter
counter.Add(ctx, 1, 
    otelmetric.WithAttributes(
        attribute.String("agent_type", "pentester"),
        attribute.String("status", "success"),
    ),
)

Agent Performance Metrics

Tracked automatically for all agents:
// Agent execution duration
type AgentDuration struct {
    AgentType      string
    TaskID         int64
    SubtaskID      *int64
    DurationSeconds float64
    Status         string
}

// Available agent types
const (
    MsgchainTypeAgent         = "agent"         // Primary orchestrator
    MsgchainTypePentester     = "pentester"     // Security testing
    MsgchainTypeCoder         = "coder"         // Code development
    MsgchainTypeInstaller     = "installer"     // Tool maintenance
    MsgchainTypeSearcher      = "searcher"      // Research
    MsgchainTypeMemorist      = "memorist"      // Memory search
    MsgchainTypeReporter      = "reporter"      // Report generation
    MsgchainTypeGenerator     = "generator"     // Task generation
    MsgchainTypeRefiner       = "refiner"       // Task refinement
    MsgchainTypeEnricher      = "enricher"      // Context enrichment
    MsgchainTypeAdviser       = "adviser"       // Strategic advice
)

Distributed Tracing

Trace Context

Automatic trace propagation:
// Create new span
ctx, span := Observer.NewSpan(
    ctx,
    SpanKindInternal,
    "execute_agent",
    oteltrace.WithAttributes(
        attribute.String("agent.type", "pentester"),
        attribute.String("task.id", taskID),
    ),
)
defer span.End()

Span Hierarchy

Automatic parent-child relationships:

Span Attributes

Rich span metadata:
span.SetAttributes(
    // Service info
    semconv.ServiceNameKey.String("pentagi"),
    semconv.ServiceVersionKey.String(version),
    
    // Component info
    attribute.String("span.component", "agent_executor"),
    
    // Operation info
    attribute.String("agent.type", "pentester"),
    attribute.String("tool.name", "terminal"),
    attribute.Int64("flow.id", flowID),
    attribute.Int64("task.id", taskID),
)

Error Tracking

Automatic error recording:
if err != nil {
    span.SetStatus(codes.Error, err.Error())
    span.RecordError(err,
        oteltrace.WithAttributes(
            semconv.ExceptionTypeKey.String(reflect.TypeOf(err).String()),
            semconv.ExceptionMessageKey.String(err.Error()),
            semconv.ExceptionStacktraceKey.String(stackTrace),
        ),
    )
}

Log Aggregation

Structured Logging

Automatic log enrichment:
logrus.WithContext(ctx).WithFields(logrus.Fields{
    "agent_type": "pentester",
    "tool_name":  "terminal",
    "task_id":    taskID,
}).Info("Executing security tool")

Log Levels

Supported log levels:
  • Trace: Very detailed debugging
  • Debug: Detailed debugging information
  • Info: General informational messages
  • Warn: Warning messages
  • Error: Error messages
  • Fatal: Critical errors causing termination

Log Integration

Logs automatically correlated with traces:
// Logs include trace context
record.AddAttributes(
    otellog.String("trace.id", spanCtx.TraceID().String()),
    otellog.String("span.id", spanCtx.SpanID().String()),
)

Query Logs

LogQL examples for common queries:
{component="agent_executor"} |= "error" | json | severity="ERROR"

Configuration

Enable Monitoring

Configure observability components:
.env
# OpenTelemetry
OTEL_HOST=otelcol:8148  # Enable metrics and traces

# Langfuse
LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PUBLIC_KEY=pk-xxx
LANGFUSE_SECRET_KEY=sk-xxx

# Langfuse OTEL Integration
LANGFUSE_OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4318

Start Monitoring Stack

# Start with observability
docker compose -f docker-compose.yml \
  -f docker-compose-langfuse.yml \
  -f docker-compose-observability.yml \
  up -d

# Access dashboards
# Langfuse: http://localhost:4000
# Grafana:  http://localhost:3000

Observability Levels

Configure verbosity:
// Log levels for OTEL hook
levels := []logrus.Level{
    logrus.ErrorLevel,  // Errors only
    logrus.WarnLevel,   // Warnings and errors
    logrus.InfoLevel,   // Info, warnings, and errors
}

InitObserver(ctx, lfclient, otelclient, levels)

Grafana Dashboards

Pre-built Dashboards

Included dashboard examples:
  1. Agent Performance
    • Agent execution times
    • Success/failure rates
    • Tool usage distribution
  2. System Health
    • CPU and memory usage
    • Request rates
    • Error rates
  3. LLM Analytics
    • Token usage trends
    • API latency
    • Cost tracking
  4. Container Metrics
    • Docker container stats
    • Network I/O
    • Resource limits

Custom Dashboards

Create custom visualizations:
  1. Navigate to Grafana (http://localhost:3000)
  2. Click “Create” → “Dashboard”
  3. Add panels with queries:
    • VictoriaMetrics: Prometheus queries
    • Loki: LogQL queries
    • Jaeger: Trace queries

Best Practices

  • Monitor agent execution times
  • Track tool invocation patterns
  • Set alerts for slow operations
  • Review token usage regularly
  • Optimize expensive operations
  • Review error logs daily
  • Set up alerts for critical errors
  • Track error rate trends
  • Investigate error spikes promptly
  • Document common error resolutions
  • Monitor LLM token usage
  • Track API costs per agent
  • Identify expensive prompts
  • Optimize context sizes
  • Use cheaper models where appropriate
  • Monitor resource utilization trends
  • Track concurrent flow counts
  • Plan for peak usage periods
  • Set resource limits appropriately
  • Scale horizontally when needed

Autonomous Testing

Understand agent execution flow

Security Tools

Tool execution tracking

Reporting

Report generation metrics

Architecture

System architecture overview

Build docs developers (and LLMs) love