Real-time Monitoring & Observability

Overview

PentAGI provides enterprise-grade observability through OpenTelemetry integration, Langfuse LLM analytics, and comprehensive monitoring dashboards. Track every aspect of your AI-powered penetration testing from agent performance to system metrics.

LLM Observability

Track AI agent performance with Langfuse

System Metrics

Real-time infrastructure monitoring

Distributed Tracing

End-to-end request tracing with Jaeger

Log Aggregation

Centralized logging with Loki

Architecture

Monitoring Stack

PentAGI uses a comprehensive observability stack:

Components

OpenTelemetry Collector

Purpose: Unified telemetry data collection and processingCapabilities:

Metrics collection and aggregation
Distributed trace processing
Log collection and forwarding
Data enrichment and filtering

Configuration:

.env

OTEL_HOST=otelcol:8148  # OpenTelemetry collector endpoint

Langfuse

Purpose: LLM observability and performance analyticsCapabilities:

Trace LLM API calls
Monitor token usage and costs
Analyze prompt performance
Track agent execution
Score generation quality

Configuration:

.env

LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key

Access: http://localhost:4000

VictoriaMetrics

Purpose: High-performance time-series metrics storageCapabilities:

Long-term metrics retention
Efficient storage compression
Fast queries and aggregations
Prometheus-compatible

Metrics Collected:

Request rates and latencies
Resource utilization (CPU, memory)
Agent execution times
Tool invocation counts
Error rates

Jaeger

Purpose: Distributed tracing for debuggingCapabilities:

End-to-end request tracing
Service dependency visualization
Performance bottleneck identification
Error propagation tracking

Trace Components:

Spans for each operation
Parent-child relationships
Timing information
Contextual attributes

Loki

Purpose: Scalable log aggregationCapabilities:

Centralized log collection
Label-based log indexing
Efficient log storage
Powerful query language (LogQL)

Log Sources:

Application logs
Agent execution logs
System logs
Docker container logs

Grafana

Purpose: Unified monitoring dashboardsCapabilities:

Custom dashboard creation
Multi-source data visualization
Alerting and notifications
Correlation of metrics, traces, and logs

Access: http://localhost:3000

LLM Observability

Langfuse Integration

PentAGI automatically tracks all LLM interactions:

type Observer interface {
    NewObservation(
        context.Context, 
        ...langfuse.ObservationContextOption
    ) (context.Context, langfuse.Observation)
}

Observation Types

Different observation types for different components:

// For environment and search tools
toolObservation := observation.Tool(
    langfuse.WithToolName(name),
    langfuse.WithToolInput(args),
    langfuse.WithToolMetadata(metadata),
)

// End with result
toolObservation.End(
    langfuse.WithToolOutput(result),
    langfuse.WithToolStatus("success"),
)

Observation Metadata

Rich metadata for filtering and analysis:

metadata := langfuse.Metadata{
    "tool_name":     name,
    "tool_category": GetToolType(name).String(),
    "flow_id":       flowID,
    "task_id":       taskID,
    "subtask_id":    subtaskID,
    "agent_type":    "pentester",
}

LLM Metrics Tracked

Token Usage

Input tokens per request
Output tokens per request
Total tokens per session
Cost estimation

Latency

First token latency (TTFT)
Total generation time
API call duration
Queue wait time

Quality Metrics

Generation scores
Hallucination detection
Output relevance
Task completion rate

Error Tracking

API failures
Rate limit errors
Timeout occurrences
Invalid responses

Langfuse Dashboard Views

Traces: Complete agent execution traces
Sessions: User session analytics
Generations: Individual LLM generations
Scores: Quality scoring and evaluation
Datasets: Prompt testing and validation
Analytics: Aggregate metrics and trends

System Metrics

Metrics Collection

Automatic metrics collection:

type Collector interface {
    StartProcessMetricCollect(attrs ...attribute.KeyValue) error
    StartGoRuntimeMetricCollect(attrs ...attribute.KeyValue) error
    StartDumperMetricCollect(stats Dumper, attrs ...attribute.KeyValue) error
}

// Initialize collectors
Observer.StartProcessMetricCollect()
Observer.StartGoRuntimeMetricCollect()

Process Metrics

OS-level process metrics:

CPU Usage: Process CPU utilization percentage
Memory Usage: RSS, VMS, heap allocations
File Descriptors: Open file descriptor count
Threads: Thread count and goroutine count
Network: Bytes sent/received

Go Runtime Metrics

Go-specific runtime metrics:

Goroutines: Number of active goroutines
Memory: Heap usage, GC stats, allocations
GC: Garbage collection frequency and pause times
Scheduler: Goroutine scheduling latency

Custom Metrics

Create custom metrics for specific monitoring:

// Track event counts
counter, err := Observer.NewInt64Counter(
    "pentagi.agent.executions",
    otelmetric.WithDescription("Total agent executions"),
)

// Increment counter
counter.Add(ctx, 1, 
    otelmetric.WithAttributes(
        attribute.String("agent_type", "pentester"),
        attribute.String("status", "success"),
    ),
)

Agent Performance Metrics

Tracked automatically for all agents:

// Agent execution duration
type AgentDuration struct {
    AgentType      string
    TaskID         int64
    SubtaskID      *int64
    DurationSeconds float64
    Status         string
}

// Available agent types
const (
    MsgchainTypeAgent         = "agent"         // Primary orchestrator
    MsgchainTypePentester     = "pentester"     // Security testing
    MsgchainTypeCoder         = "coder"         // Code development
    MsgchainTypeInstaller     = "installer"     // Tool maintenance
    MsgchainTypeSearcher      = "searcher"      // Research
    MsgchainTypeMemorist      = "memorist"      // Memory search
    MsgchainTypeReporter      = "reporter"      // Report generation
    MsgchainTypeGenerator     = "generator"     // Task generation
    MsgchainTypeRefiner       = "refiner"       // Task refinement
    MsgchainTypeEnricher      = "enricher"      // Context enrichment
    MsgchainTypeAdviser       = "adviser"       // Strategic advice
)

Distributed Tracing

Trace Context

Automatic trace propagation:

// Create new span
ctx, span := Observer.NewSpan(
    ctx,
    SpanKindInternal,
    "execute_agent",
    oteltrace.WithAttributes(
        attribute.String("agent.type", "pentester"),
        attribute.String("task.id", taskID),
    ),
)
defer span.End()

Span Hierarchy

Automatic parent-child relationships:

Span Attributes

Rich span metadata:

span.SetAttributes(
    // Service info
    semconv.ServiceNameKey.String("pentagi"),
    semconv.ServiceVersionKey.String(version),
    
    // Component info
    attribute.String("span.component", "agent_executor"),
    
    // Operation info
    attribute.String("agent.type", "pentester"),
    attribute.String("tool.name", "terminal"),
    attribute.Int64("flow.id", flowID),
    attribute.Int64("task.id", taskID),
)

Error Tracking

Automatic error recording:

if err != nil {
    span.SetStatus(codes.Error, err.Error())
    span.RecordError(err,
        oteltrace.WithAttributes(
            semconv.ExceptionTypeKey.String(reflect.TypeOf(err).String()),
            semconv.ExceptionMessageKey.String(err.Error()),
            semconv.ExceptionStacktraceKey.String(stackTrace),
        ),
    )
}

Log Aggregation

Structured Logging

Automatic log enrichment:

logrus.WithContext(ctx).WithFields(logrus.Fields{
    "agent_type": "pentester",
    "tool_name":  "terminal",
    "task_id":    taskID,
}).Info("Executing security tool")

Log Levels

Supported log levels:

Trace: Very detailed debugging
Debug: Detailed debugging information
Info: General informational messages
Warn: Warning messages
Error: Error messages
Fatal: Critical errors causing termination

Log Integration

Logs automatically correlated with traces:

// Logs include trace context
record.AddAttributes(
    otellog.String("trace.id", spanCtx.TraceID().String()),
    otellog.String("span.id", spanCtx.SpanID().String()),
)

Query Logs

LogQL examples for common queries:

{component="agent_executor"} |= "error" | json | severity="ERROR"

Configuration

Enable Monitoring

Configure observability components:

.env

# OpenTelemetry
OTEL_HOST=otelcol:8148  # Enable metrics and traces

# Langfuse
LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PUBLIC_KEY=pk-xxx
LANGFUSE_SECRET_KEY=sk-xxx

# Langfuse OTEL Integration
LANGFUSE_OTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4318

Start Monitoring Stack

# Start with observability
docker compose -f docker-compose.yml \
  -f docker-compose-langfuse.yml \
  -f docker-compose-observability.yml \
  up -d

# Access dashboards
# Langfuse: http://localhost:4000
# Grafana:  http://localhost:3000

Observability Levels

Configure verbosity:

// Log levels for OTEL hook
levels := []logrus.Level{
    logrus.ErrorLevel,  // Errors only
    logrus.WarnLevel,   // Warnings and errors
    logrus.InfoLevel,   // Info, warnings, and errors
}

InitObserver(ctx, lfclient, otelclient, levels)

Grafana Dashboards

Pre-built Dashboards

Included dashboard examples:

Agent Performance
- Agent execution times
- Success/failure rates
- Tool usage distribution
System Health
- CPU and memory usage
- Request rates
- Error rates
LLM Analytics
- Token usage trends
- API latency
- Cost tracking
Container Metrics
- Docker container stats
- Network I/O
- Resource limits

Custom Dashboards

Create custom visualizations:

Navigate to Grafana (http://localhost:3000)
Click “Create” → “Dashboard”
Add panels with queries:
- VictoriaMetrics: Prometheus queries
- Loki: LogQL queries
- Jaeger: Trace queries

Best Practices

Performance Monitoring

Monitor agent execution times
Track tool invocation patterns
Set alerts for slow operations
Review token usage regularly
Optimize expensive operations

Error Tracking

Review error logs daily
Set up alerts for critical errors
Track error rate trends
Investigate error spikes promptly
Document common error resolutions

Cost Optimization

Monitor LLM token usage
Track API costs per agent
Identify expensive prompts
Optimize context sizes
Use cheaper models where appropriate

Capacity Planning

Monitor resource utilization trends
Track concurrent flow counts
Plan for peak usage periods
Set resource limits appropriately
Scale horizontally when needed

Autonomous Testing

Understand agent execution flow

Security Tools

Tool execution tracking

Reporting

Report generation metrics

Architecture

System architecture overview

Get Started

Core Concepts

Configuration

Deployment

Features

Development

​Overview

LLM Observability

System Metrics

Distributed Tracing

Log Aggregation

​Architecture

​Monitoring Stack

​Components

​LLM Observability

​Langfuse Integration

​Observation Types

​Observation Metadata

​LLM Metrics Tracked

Token Usage

Latency

Quality Metrics

Error Tracking

​Langfuse Dashboard Views

​System Metrics

​Metrics Collection

​Process Metrics

​Go Runtime Metrics

​Custom Metrics

​Agent Performance Metrics

​Distributed Tracing

​Trace Context

​Span Hierarchy

​Span Attributes

​Error Tracking

​Log Aggregation

​Structured Logging

​Log Levels

​Log Integration

​Query Logs

​Configuration

​Enable Monitoring

​Start Monitoring Stack

​Observability Levels

​Grafana Dashboards

​Pre-built Dashboards

​Custom Dashboards

​Best Practices

​Related Resources

Autonomous Testing

Security Tools

Reporting

Architecture

Build docs developers (and LLMs) love

Overview

Architecture

Monitoring Stack

Components

LLM Observability

Langfuse Integration

Observation Types

Observation Metadata

LLM Metrics Tracked

Langfuse Dashboard Views

System Metrics

Metrics Collection

Process Metrics

Go Runtime Metrics

Custom Metrics

Agent Performance Metrics

Distributed Tracing

Trace Context

Span Hierarchy

Span Attributes

Error Tracking

Log Aggregation

Structured Logging

Log Levels

Log Integration

Query Logs

Configuration

Enable Monitoring

Start Monitoring Stack

Observability Levels

Grafana Dashboards

Pre-built Dashboards

Custom Dashboards

Best Practices

Related Resources