Skip to main content
Stagehand provides built-in observability features to monitor performance, track token usage, and measure costs for your browser automation workflows.

Metrics Overview

Access comprehensive metrics for all Stagehand operations:
import { Stagehand } from '@browserbasehq/stagehand';

const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();

// Perform operations
await stagehand.act("click the login button");
const data = await stagehand.extract("get product details", schema);

// Get metrics
const metrics = await stagehand.metrics;
console.log(metrics);

StagehandMetrics Interface

The metrics object provides detailed usage statistics:
interface StagehandMetrics {
  // Act operation metrics
  actPromptTokens: number;           // Input tokens for act()
  actCompletionTokens: number;       // Output tokens for act()
  actReasoningTokens: number;        // Reasoning tokens for act()
  actCachedInputTokens: number;      // Cached input tokens for act()
  actInferenceTimeMs: number;        // Total inference time for act()
  
  // Extract operation metrics
  extractPromptTokens: number;
  extractCompletionTokens: number;
  extractReasoningTokens: number;
  extractCachedInputTokens: number;
  extractInferenceTimeMs: number;
  
  // Observe operation metrics
  observePromptTokens: number;
  observeCompletionTokens: number;
  observeReasoningTokens: number;
  observeCachedInputTokens: number;
  observeInferenceTimeMs: number;
  
  // Agent operation metrics
  agentPromptTokens: number;
  agentCompletionTokens: number;
  agentReasoningTokens: number;
  agentCachedInputTokens: number;
  agentInferenceTimeMs: number;
  
  // Totals across all operations
  totalPromptTokens: number;
  totalCompletionTokens: number;
  totalReasoningTokens: number;
  totalCachedInputTokens: number;
  totalInferenceTimeMs: number;
}

Token Usage Tracking

Monitor token consumption for cost optimization:
const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();

// Perform multiple operations
await stagehand.act("fill in the form");
await stagehand.act("click submit");
const result = await stagehand.extract("extract form data", schema);

// Check token usage
const metrics = await stagehand.metrics;

console.log(`Total input tokens: ${metrics.totalPromptTokens}`);
console.log(`Total output tokens: ${metrics.totalCompletionTokens}`);
console.log(`Cached tokens: ${metrics.totalCachedInputTokens}`);
console.log(`Total inference time: ${metrics.totalInferenceTimeMs}ms`);

Per-Operation Metrics

Track metrics for specific operation types:
const metrics = await stagehand.metrics;

// Act operation statistics
console.log('Act Operations:');
console.log(`  Input tokens: ${metrics.actPromptTokens}`);
console.log(`  Output tokens: ${metrics.actCompletionTokens}`);
console.log(`  Inference time: ${metrics.actInferenceTimeMs}ms`);

// Extract operation statistics
console.log('Extract Operations:');
console.log(`  Input tokens: ${metrics.extractPromptTokens}`);
console.log(`  Output tokens: ${metrics.extractCompletionTokens}`);
console.log(`  Inference time: ${metrics.extractInferenceTimeMs}ms`);

// Agent operation statistics
console.log('Agent Operations:');
console.log(`  Input tokens: ${metrics.agentPromptTokens}`);
console.log(`  Output tokens: ${metrics.agentCompletionTokens}`);
console.log(`  Reasoning tokens: ${metrics.agentReasoningTokens}`);
console.log(`  Inference time: ${metrics.agentInferenceTimeMs}ms`);

Cost Calculation

Calculate costs based on token usage:
const metrics = await stagehand.metrics;

// GPT-4o pricing (example rates)
const INPUT_COST_PER_1M = 5.00;   // $5 per 1M input tokens
const OUTPUT_COST_PER_1M = 15.00;  // $15 per 1M output tokens
const CACHED_COST_PER_1M = 2.50;   // $2.50 per 1M cached tokens

const inputCost = (metrics.totalPromptTokens / 1_000_000) * INPUT_COST_PER_1M;
const outputCost = (metrics.totalCompletionTokens / 1_000_000) * OUTPUT_COST_PER_1M;
const cachedCost = (metrics.totalCachedInputTokens / 1_000_000) * CACHED_COST_PER_1M;

const totalCost = inputCost + outputCost + cachedCost;

console.log(`Total cost: $${totalCost.toFixed(4)}`);
console.log(`  Input tokens cost: $${inputCost.toFixed(4)}`);
console.log(`  Output tokens cost: $${outputCost.toFixed(4)}`);
console.log(`  Cached tokens cost: $${cachedCost.toFixed(4)}`);

History Tracking

Access the complete history of operations:
const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();

// Perform operations
await stagehand.act("click login");
await stagehand.extract("get user data", schema);

// Get operation history
const history = await stagehand.history;

for (const entry of history) {
  console.log(`Operation: ${entry.method}`);
  console.log(`Timestamp: ${entry.timestamp}`);
  console.log(`Duration: ${entry.endTime - entry.timestamp}ms`);
  
  if (entry.tokenUsage) {
    console.log(`Tokens used: ${entry.tokenUsage.inputTokens + entry.tokenUsage.outputTokens}`);
    console.log(`Cost: $${entry.tokenUsage.cost}`);
  }
}

HistoryEntry Interface

interface HistoryEntry {
  method: string;                        // Operation name (act, extract, observe, etc.)
  parameters: Record<string, unknown>;   // Operation parameters
  result: Record<string, unknown>;       // Operation result
  timestamp: number;                     // Start timestamp (ms since epoch)
  endTime?: number;                      // End timestamp (ms since epoch)
  tokenUsage?: {                         // Token usage for this operation
    inputTokens?: number;
    outputTokens?: number;
    timeMs?: number;
    cost?: number;
  };
}

Replay Metrics

Retrieve session replay data with detailed action-level metrics:
const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
});

await stagehand.init();

// Perform operations
await stagehand.act("navigate and login");

// Get replay metrics (Browserbase only)
const sessionId = stagehand.browserbaseSessionID;

// Access via Browserbase API or internal methods
// Replay data includes page URLs, actions, timestamps, and token usage

Custom Monitoring Integration

Integrate Stagehand metrics with your monitoring stack:

Datadog Example

import { StatsD } from 'hot-shots';

const dogstatsd = new StatsD();

const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();

// Perform operations
await stagehand.act("click button");

// Send metrics to Datadog
const metrics = await stagehand.metrics;

dogstatsd.gauge('stagehand.tokens.input', metrics.totalPromptTokens);
dogstatsd.gauge('stagehand.tokens.output', metrics.totalCompletionTokens);
dogstatsd.gauge('stagehand.tokens.cached', metrics.totalCachedInputTokens);
dogstatsd.timing('stagehand.inference.time', metrics.totalInferenceTimeMs);

Prometheus Example

import client from 'prom-client';

const tokenCounter = new client.Counter({
  name: 'stagehand_tokens_total',
  help: 'Total tokens used by Stagehand',
  labelNames: ['type', 'operation'],
});

const inferenceHistogram = new client.Histogram({
  name: 'stagehand_inference_duration_ms',
  help: 'LLM inference duration in milliseconds',
  labelNames: ['operation'],
});

const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();
await stagehand.act("perform action");

const metrics = await stagehand.metrics;

// Track metrics in Prometheus
tokenCounter.inc({ type: 'input', operation: 'act' }, metrics.actPromptTokens);
tokenCounter.inc({ type: 'output', operation: 'act' }, metrics.actCompletionTokens);
inferenceHistogram.observe({ operation: 'act' }, metrics.actInferenceTimeMs);

CloudWatch Example

import { CloudWatch } from '@aws-sdk/client-cloudwatch';

const cloudwatch = new CloudWatch({ region: 'us-east-1' });

const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();
await stagehand.act("perform action");

const metrics = await stagehand.metrics;

// Send metrics to CloudWatch
await cloudwatch.putMetricData({
  Namespace: 'Stagehand',
  MetricData: [
    {
      MetricName: 'InputTokens',
      Value: metrics.totalPromptTokens,
      Unit: 'Count',
    },
    {
      MetricName: 'OutputTokens',
      Value: metrics.totalCompletionTokens,
      Unit: 'Count',
    },
    {
      MetricName: 'InferenceTime',
      Value: metrics.totalInferenceTimeMs,
      Unit: 'Milliseconds',
    },
  ],
});

Monitoring Best Practices

Track Cost Trends

Monitor token usage over time to identify cost optimization opportunities.
const metrics = await stagehand.metrics;
const cost = calculateCost(metrics);
logToMonitoring({ cost, timestamp: Date.now() });

Set Cost Alerts

Configure alerts when token usage exceeds thresholds.
if (metrics.totalPromptTokens > 1_000_000) {
  alert('High token usage detected');
}

Performance Monitoring

Track inference time to identify slow operations.
if (metrics.totalInferenceTimeMs > 10000) {
  console.warn('Slow inference detected');
}

Cache Efficiency

Monitor cached token usage to measure cache effectiveness.
const cacheHitRate = metrics.totalCachedInputTokens / 
  (metrics.totalPromptTokens || 1);
console.log(`Cache hit rate: ${(cacheHitRate * 100).toFixed(2)}%`);

Agent Replay Tracking

Track agent execution for replay and debugging:
const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4o",
});

await stagehand.init();

const agent = stagehand.agent();
await agent.execute("complete the checkout process");

// Check if agent replay is active
if (stagehand.isAgentReplayActive()) {
  console.log('Agent replay is being recorded');
}

// Access history for replay
const history = await stagehand.history;
for (const step of history) {
  console.log(`Step: ${step.method}`);
  console.log(`Duration: ${step.endTime - step.timestamp}ms`);
}

Debugging with Metrics

Find which operations consume the most tokens:
const metrics = await stagehand.metrics;

const operations = [
  { name: 'act', tokens: metrics.actPromptTokens + metrics.actCompletionTokens },
  { name: 'extract', tokens: metrics.extractPromptTokens + metrics.extractCompletionTokens },
  { name: 'observe', tokens: metrics.observePromptTokens + metrics.observeCompletionTokens },
  { name: 'agent', tokens: metrics.agentPromptTokens + metrics.agentCompletionTokens },
];

operations.sort((a, b) => b.tokens - a.tokens);
console.log('Most expensive operations:', operations);
Calculate cache hit rates:
const metrics = await stagehand.metrics;
const totalInput = metrics.totalPromptTokens + metrics.totalCachedInputTokens;
const cacheHitRate = metrics.totalCachedInputTokens / totalInput;

console.log(`Cache hit rate: ${(cacheHitRate * 100).toFixed(2)}%`);
console.log(`Savings: ${metrics.totalCachedInputTokens} tokens`);
Find operations with high inference time:
const history = await stagehand.history;
const slowOps = history.filter(entry => 
  (entry.endTime - entry.timestamp) > 5000
);

console.log('Slow operations:', slowOps);
Best Practice: Regularly review metrics in production to identify optimization opportunities and control costs.

Build docs developers (and LLMs) love