Telemetry & Observability

Overview

ADK-TS includes comprehensive OpenTelemetry integration for observing agent behavior, tracking performance, and debugging issues in production. The telemetry system captures traces, metrics, and events across the entire agent lifecycle.

Quick Start

import { telemetryService } from '@iqai/adk';

// Initialize with OTLP endpoint
await telemetryService.initialize({
  appName: 'my-agent-app',
  appVersion: '1.0.0',
  otlpEndpoint: 'http://localhost:4318/v1/traces',
  enableTracing: true,
  enableMetrics: true,
});

// Your agent code here
const agent = new AgentBuilder()
  .withModel('gpt-4')
  .buildLlm();

await agent.ask('Hello!');

// Shutdown gracefully
await telemetryService.shutdown();

Telemetry Configuration

interface TelemetryConfig {
  // Application identity
  appName: string;              // Service name
  appVersion?: string;          // Service version
  environment?: string;         // 'development' | 'production' | 'staging'
  
  // OTLP exporter
  otlpEndpoint?: string;        // OTLP endpoint URL
  otlpHeaders?: Record<string, string>; // Custom headers
  
  // Feature flags
  enableTracing?: boolean;      // Enable distributed tracing
  enableMetrics?: boolean;      // Enable metrics collection
  enableAutoInstrumentation?: boolean; // Auto-instrument HTTP, etc.
  
  // Sampling and performance
  samplingRatio?: number;       // 0.0 to 1.0 (default: 1.0)
  metricExportIntervalMs?: number; // Metric export interval
  
  // Resource attributes
  resourceAttributes?: Record<string, any>;
  
  // Debug mode
  debug?: boolean;              // Enable in-memory span collection
}

Source: packages/adk/src/telemetry/types.ts

Example Configurations

await telemetryService.initialize({
  appName: 'my-agent',
  appVersion: '0.1.0',
  environment: 'development',
  enableTracing: true,
  enableMetrics: true,
  debug: true, // Enable in-memory debugging
});

Tracing

The tracing system automatically captures agent operations:

Automatic Traces

ADK-TS automatically traces: ✅ LLM Calls: Model requests/responses with token counts ✅ Tool Executions: Function calls with arguments and results ✅ Agent Invocations: Complete agent runs ✅ Agent Transfers: Multi-agent handoffs ✅ Memory Operations: Search and insert operations ✅ Plugin Hooks: Plugin lifecycle events Source: packages/adk/src/telemetry/tracing.ts:27

Trace Attributes

All traces include standard OpenTelemetry GenAI semantic conventions:

// LLM traces include:
{
  'gen_ai.provider.name': 'openai',
  'gen_ai.operation.name': 'chat',
  'gen_ai.request.model': 'gpt-4',
  'gen_ai.response.model': 'gpt-4',
  'gen_ai.usage.input_tokens': 150,
  'gen_ai.usage.output_tokens': 75,
  'gen_ai.response.finish_reasons': ['stop'],
  
  // ADK-specific attributes
  'adk.session_id': 'session_abc123',
  'adk.user_id': 'user_456',
  'adk.agent_name': 'MyAgent',
  'adk.environment': 'production',
}

Source: packages/adk/src/telemetry/tracing.ts:220

Custom Spans

import { telemetryService } from '@iqai/adk';

// Wrap async operations
await telemetryService.withSpan(
  'custom_operation',
  async (span) => {
    // Your code here
    span.setAttribute('custom.attribute', 'value');
    
    const result = await performOperation();
    
    span.addEvent('operation_completed', {
      result_count: result.length,
    });
    
    return result;
  },
  {
    // Initial attributes
    'operation.type': 'data_processing',
  }
);

Async Generator Tracing

// Trace streaming operations
async function* processStream() {
  for await (const item of dataStream) {
    yield item;
  }
}

const tracedStream = telemetryService.traceAsyncGenerator(
  'process_stream',
  processStream(),
  { stream_type: 'data' }
);

for await (const item of tracedStream) {
  console.log(item);
}

Source: packages/adk/src/telemetry/tracing.ts:373

Metrics

The metrics system tracks quantitative data:

Automatic Metrics

✅ LLM Token Usage: Input/output tokens by model ✅ LLM Call Count: Total LLM invocations ✅ LLM Duration: Request latency ✅ Tool Call Count: Function execution frequency ✅ Error Count: Failures by category Source: packages/adk/src/telemetry/metrics.ts

Recording Custom Metrics

import { telemetryService } from '@iqai/adk';

// Record LLM tokens
telemetryService.recordLlmTokens(
  promptTokens: 100,
  completionTokens: 50,
  {
    model: 'gpt-4',
    agentName: 'MyAgent',
    environment: 'production',
    status: 'success',
  }
);

// Record LLM call
telemetryService.recordLlmCall({
  model: 'gpt-4',
  agentName: 'MyAgent',
  environment: 'production',
  status: 'success',
});

// Record LLM duration
telemetryService.recordLlmDuration(1500, { // milliseconds
  model: 'gpt-4',
  agentName: 'MyAgent',
  status: 'success',
});

// Record tool call
telemetryService.recordToolCall(
  toolName: 'web_search',
  {
    agentName: 'SearchAgent',
    status: 'success',
  }
);

// Record errors
telemetryService.recordError(
  category: 'llm',
  errorType: 'rate_limit_exceeded',
);

Metric Dimensions

All metrics support dimensions for filtering and aggregation:

interface MetricAttributes {
  model?: string;           // LLM model name
  agentName?: string;       // Agent name
  environment?: string;     // Environment
  status?: 'success' | 'error';
  toolName?: string;        // Tool name
  errorType?: string;       // Error type
}

Content Capture

Control whether to capture full request/response content:

// Enable content capture (default: disabled)
process.env.ADK_TELEMETRY_CAPTURE_CONTENT = 'true';

await telemetryService.initialize({ ... });

// When enabled, traces include:
// - Full LLM prompts
// - Complete LLM responses
// - Tool arguments
// - Tool results

Privacy Warning: Content capture may log sensitive data. Only enable in non-production environments or ensure data is properly sanitized.

OpenTelemetry Backends

Jaeger (Development)

# Start Jaeger
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

await telemetryService.initialize({
  appName: 'my-agent',
  otlpEndpoint: 'http://localhost:4318/v1/traces',
  enableTracing: true,
});

View traces at: http://localhost:16686

Honeycomb

await telemetryService.initialize({
  appName: 'my-agent',
  otlpEndpoint: 'https://api.honeycomb.io/v1/traces',
  otlpHeaders: {
    'x-honeycomb-team': process.env.HONEYCOMB_API_KEY!,
    'x-honeycomb-dataset': 'my-agent-dataset',
  },
  enableTracing: true,
  enableMetrics: true,
});

Datadog

// Requires Datadog Agent with OTLP enabled
await telemetryService.initialize({
  appName: 'my-agent',
  otlpEndpoint: 'http://localhost:4318/v1/traces',
  resourceAttributes: {
    'service.namespace': 'ai-agents',
  },
  enableTracing: true,
});

New Relic

await telemetryService.initialize({
  appName: 'my-agent',
  otlpEndpoint: 'https://otlp.nr-data.net:4318/v1/traces',
  otlpHeaders: {
    'api-key': process.env.NEW_RELIC_LICENSE_KEY!,
  },
  enableTracing: true,
});

Grafana Cloud

await telemetryService.initialize({
  appName: 'my-agent',
  otlpEndpoint: 'https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/traces',
  otlpHeaders: {
    'Authorization': `Basic ${Buffer.from(
      `${process.env.GRAFANA_INSTANCE_ID}:${process.env.GRAFANA_API_KEY}`
    ).toString('base64')}`,
  },
  enableTracing: true,
  enableMetrics: true,
});

Debugging with In-Memory Exporter

import { telemetryService } from '@iqai/adk';

await telemetryService.initialize({
  appName: 'my-agent',
  debug: true, // Enables in-memory span collection
});

// Run your agent
await agent.ask('Hello');

// Inspect captured spans
const exporter = telemetryService.getInMemoryExporter();
const spans = exporter.getFinishedSpans();

for (const span of spans) {
  console.log('Span:', span.name);
  console.log('Attributes:', span.attributes);
  console.log('Duration:', span.duration);
}

// Clear spans
exporter.reset();

Source: packages/adk/src/telemetry/in-memory-exporter.ts

Advanced Tracing Patterns

Error Tracing

import { telemetryService } from '@iqai/adk';

try {
  await riskyOperation();
} catch (error) {
  telemetryService.traceError(
    error as Error,
    'tool_error',
    true,  // recoverable
    true,  // retry recommended
  );
  
  throw error;
}

Source: packages/adk/src/telemetry/tracing.ts:598

Memory Operations

telemetryService.traceMemoryOperation(
  'search',
  sessionId: 'session_123',
  query: 'user preferences',
  resultsCount: 5,
  invocationContext,
);

Source: packages/adk/src/telemetry/tracing.ts:633

Agent Transfers

telemetryService.traceAgentTransfer(
  sourceAgent: 'MainAgent',
  targetAgent: 'SpecialistAgent',
  transferChain: ['RootAgent', 'MainAgent'],
  transferDepth: 2,
  reason: 'Requires specialized knowledge',
  invocationContext,
);

Source: packages/adk/src/telemetry/tracing.ts:507

Sampling Strategies

Control trace volume with sampling:

// Sample 10% of traces
await telemetryService.initialize({
  appName: 'my-agent',
  samplingRatio: 0.1,
});

// Sample based on conditions
class SmartSampler {
  shouldSample(context: InvocationContext): boolean {
    // Always sample errors
    if (context.hasError) {
      return true;
    }
    
    // Always sample slow requests
    if (context.durationMs > 5000) {
      return true;
    }
    
    // Sample 1% of normal requests
    return Math.random() < 0.01;
  }
}

Best Practices

Initialize Early: Set up telemetry before creating agents
Use Structured Attributes: Add meaningful dimensions to traces/metrics
Sample Appropriately: Balance observability with cost
Handle Shutdown: Always call telemetryService.shutdown() on exit
Secure Credentials: Never log API keys or tokens
Monitor Performance: Watch for telemetry overhead
Test Locally: Use Jaeger for development

Telemetry adds minimal overhead (less than 1% in production). The debug mode with in-memory export has higher overhead and should only be used in development.

Graceful Shutdown

import { telemetryService } from '@iqai/adk';

process.on('SIGTERM', async () => {
  console.log('Shutting down...');
  
  // Flush pending telemetry
  await telemetryService.flush();
  
  // Shutdown telemetry
  await telemetryService.shutdown();
  
  process.exit(0);
});

Source: packages/adk/src/telemetry/setup.ts:340

Next Steps

Flows & Processors

Build custom request/response processors

Examples

See telemetry in action

Getting Started

Core Concepts

Agents

Models & Providers

Tools

Memory & State

Advanced Features

CLI Tool

Examples

​Overview

​Quick Start

​Telemetry Configuration

​Example Configurations

​Tracing

​Automatic Traces

​Trace Attributes

​Custom Spans

​Async Generator Tracing

​Metrics

​Automatic Metrics

​Recording Custom Metrics

​Metric Dimensions

​Content Capture

​OpenTelemetry Backends

​Jaeger (Development)

​Honeycomb

​Datadog

​New Relic

​Grafana Cloud

​Debugging with In-Memory Exporter

​Advanced Tracing Patterns

​Error Tracing

​Memory Operations

​Agent Transfers

​Sampling Strategies

​Best Practices

​Graceful Shutdown

​Next Steps

Flows & Processors

Examples

Build docs developers (and LLMs) love

Overview

Quick Start

Telemetry Configuration

Example Configurations

Tracing

Automatic Traces

Trace Attributes

Custom Spans

Async Generator Tracing

Metrics

Automatic Metrics

Recording Custom Metrics

Metric Dimensions

Content Capture

OpenTelemetry Backends

Jaeger (Development)

Honeycomb

Datadog

New Relic

Grafana Cloud

Debugging with In-Memory Exporter

Advanced Tracing Patterns

Error Tracing

Memory Operations

Agent Transfers

Sampling Strategies

Best Practices

Graceful Shutdown

Next Steps