How Stagehand Works

Stagehand combines AI intelligence with direct browser control to create reliable automation that adapts to real-world web applications. Here’s how it works under the hood.

Architecture Overview

Stagehand’s V3 architecture orchestrates multiple components that work together:

V3 Core

The main orchestrator that manages browser lifecycle, handles method routing, and coordinates between all components.

Handlers

Specialized classes (ActHandler, ExtractHandler, ObserveHandler) that translate user instructions into browser actions.

Context & Pages

Manages CDP connections, frame trees, and page lifecycle across both local Chrome and Browserbase.

Cache System

Self-healing cache that replays successful actions without LLM calls.

The AI + Code Pipeline

When you call a Stagehand method, here’s what happens:

1. Instruction Processing

You provide a natural language instruction:

await stagehand.act("click the submit button");

Instructions are processed alongside optional custom system prompts, allowing you to guide the AI’s behavior.

2. DOM Snapshot Capture

Stagehand captures a hybrid accessibility tree that combines:

Semantic structure from the accessibility tree
Interactive elements from the DOM
Shadow DOM piercing to access elements inside web components

v3.ts:155-158

const { combinedTree, combinedXpathMap } = await captureHybridSnapshot(
  page,
  { experimental: true },
);

This creates a compact, LLM-friendly representation of the page that focuses on actionable elements rather than overwhelming the model with every DOM node.

3. LLM Inference

The instruction and DOM snapshot are sent to the LLM with a carefully crafted prompt:

prompt.ts:150-169

const actSystemPrompt = `
You are helping the user automate the browser by finding elements based on what action the user wants to take on the page

You will be given:
1. a user defined instruction about what action to take
2. a hierarchical accessibility tree showing the semantic structure of the page.

Return the element that matches the instruction if it exists. Otherwise, return an empty object.`;

The LLM returns structured data identifying:

Element selector (XPath)
Action method (click, type, scroll, etc.)
Arguments (text to type, keys to press, etc.)

4. Deterministic Execution

Once the LLM identifies the action, Stagehand executes it using deterministic browser control:

actHandler.ts:191-197

const firstResult = await this.takeDeterministicAction(
  firstAction,
  page,
  this.defaultDomSettleTimeoutMs,
  llmClient,
  ensureTimeRemaining,
  variables,
);

This separation is crucial:

AI makes decisions about what to do

Code executes actions deterministically via CDP

The AI never directly controls the browser—it identifies targets, and reliable code handles the actual interactions.

5. Self-Healing

If an element has moved or changed, Stagehand can automatically adapt:

Initial attempt using the cached selector fails
Re-capture the current DOM state
Diff the trees to find where the element moved
Update the selector and retry
Update the cache with the new selector

actCache.ts:261-269

if (
  success &&
  actions.length > 0 &&
  this.haveActionsChanged(entry.actions, actions)
) {
  await this.refreshCacheEntry(context, {
    ...entry,
    actions,
  });
}

Browser Connection Modes

Stagehand supports two execution environments:

Local Chrome
Browserbase

For development and testing

const stagehand = new Stagehand({
  env: "LOCAL",
  verbose: 2,
});

Stagehand launches and controls a local Chrome instance via chrome-launcher. Perfect for:

Local development
Debugging with visible browser
Testing on your machine

Source: v3.ts:717-873

For production and scale

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
});

Connects to a remote browser session on Browserbase’s infrastructure. Benefits:

Cloud-based execution
Session recording and debugging
No local Chrome installation needed
Advanced stealth mode

Source: v3.ts:876-996

Both modes use the same CDP protocol under the hood, so your code works identically in either environment.

CDP Connection Management

All browser control flows through a single CdpConnection managed by V3Context:

context.ts:153-172

static async create(
  wsUrl: string,
  opts?: {
    env?: "LOCAL" | "BROWSERBASE";
    apiClient?: StagehandAPIClient | null;
  },
): Promise<V3Context> {
  const conn = await CdpConnection.connect(wsUrl);
  const ctx = new V3Context(conn, opts?.env ?? "LOCAL");
  await ctx.bootstrap();
  await ctx.waitForFirstTopLevelPage(getFirstTopLevelPageTimeoutMs());
  return ctx;
}

V3Context handles target lifecycle, frame events, and OOPIF (out-of-process iframes) automatically, so you don’t have to think about it.

Handler Architecture

Each Stagehand method is backed by a specialized handler:

Handler	Purpose	Returns
ActHandler	Performs actions (click, type, etc.)	`ActResult` with success status and executed actions
ExtractHandler	Extracts data from the page	Structured data matching your schema
ObserveHandler	Finds actionable elements	Array of `Action` objects
V3AgentHandler	Multi-step autonomous execution	`AgentResult` with full execution history

Each handler:

Accepts high-level instructions
Captures DOM snapshots
Queries the LLM
Executes deterministic actions
Reports metrics and results

Event Bus System

Stagehand uses an EventEmitter for internal communication:

v3.ts:155

public readonly bus: EventEmitter = new EventEmitter();

This enables:

Screenshot capture events during agent execution
Page lifecycle notifications
Error propagation across components
Plugin hooks (future feature)

Metrics & Observability

Stagehand tracks detailed metrics for every LLM call:

v3.ts:241-267

public stagehandMetrics: StagehandMetrics = {
  actPromptTokens: 0,
  actCompletionTokens: 0,
  actReasoningTokens: 0,
  actCachedInputTokens: 0,
  actInferenceTimeMs: 0,
  extractPromptTokens: 0,
  extractCompletionTokens: 0,
  // ... more metrics
};

Access them at any time:

const metrics = await stagehand.metrics;
console.log(`Total tokens used: ${metrics.totalPromptTokens + metrics.totalCompletionTokens}`);
console.log(`Cache hits saved: ${metrics.totalCachedInputTokens} tokens`);

Key Design Principles

AI for Intelligence, Code for Reliability

The LLM identifies elements and plans actions, but all browser control uses deterministic CDP commands. This gives you the best of both worlds: adaptability from AI, reliability from code.

Cache-First Execution

Successful actions are cached and replayed without LLM calls. The cache self-heals when pages change, providing speed and reliability.

Unified API Surface

Whether you’re running locally or on Browserbase, the API stays the same. The V3 class abstracts away all environment differences.

Observable & Debuggable

Every action is logged, every metric is tracked, and session recordings are available. You always know what Stagehand is doing.

Next Steps

Write Effective AI Rules

Learn how to guide the AI with clear instructions

Understand Browser Contexts

Master pages, frames, and context management

Leverage Caching

Speed up execution with self-healing cache

See Examples

Explore real-world usage patterns

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

How Stagehand Works

Architecture Overview

V3 Core

Handlers

Context & Pages

Cache System

The AI + Code Pipeline

1. Instruction Processing

2. DOM Snapshot Capture

3. LLM Inference

4. Deterministic Execution

5. Self-Healing

Browser Connection Modes

CDP Connection Management

Handler Architecture

Event Bus System

Metrics & Observability

Key Design Principles

Next Steps

Write Effective AI Rules

Understand Browser Contexts

Leverage Caching

See Examples

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

​Architecture Overview

V3 Core

Handlers

Context & Pages

Cache System

​The AI + Code Pipeline

​1. Instruction Processing

​2. DOM Snapshot Capture

​3. LLM Inference

​4. Deterministic Execution

​5. Self-Healing

​Browser Connection Modes

​CDP Connection Management

​Handler Architecture

​Event Bus System

​Metrics & Observability

​Key Design Principles

​Next Steps

Write Effective AI Rules

Understand Browser Contexts

Leverage Caching

See Examples

Build docs developers (and LLMs) love

Architecture Overview

The AI + Code Pipeline

1. Instruction Processing

2. DOM Snapshot Capture

3. LLM Inference

4. Deterministic Execution

5. Self-Healing

Browser Connection Modes

CDP Connection Management

Handler Architecture

Event Bus System

Metrics & Observability

Key Design Principles

Next Steps