Skip to main content

Overview

The agent() method creates an autonomous agent that can perform multi-step browser automation tasks. Agents can navigate websites, interact with elements, extract data, and make decisions to complete complex workflows.

Method Signature

agent(config?: AgentConfig): AgentInstance

Parameters

config
AgentConfig
Optional configuration for the agent.

Agent Instance Methods

execute()

Executes the agent with a given instruction.
execute(
  instructionOrOptions: string | AgentExecuteOptions
): Promise<AgentResult>
instructionOrOptions
string | AgentExecuteOptions
required
The task instruction (string) or full options object.

Return Value

Returns a Promise<AgentResult>:
interface AgentResult {
  success: boolean;      // Whether the task completed successfully
  message: string;       // Agent's final message
  actions: AgentAction[]; // Actions taken by the agent
  completed: boolean;    // Whether agent called the done tool
  messages?: ModelMessage[]; // Conversation messages (for continuation)
  output?: Record<string, unknown>; // Custom output data (if schema provided)
  usage?: {              // Token usage statistics
    input_tokens: number;
    output_tokens: number;
    reasoning_tokens?: number;
    cached_input_tokens?: number;
    inference_time_ms: number;
  };
}

Usage Examples

Basic Agent Task

import { Stagehand } from "@stagehand/api";

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
});

await stagehand.init();
const page = stagehand.context.pages()[0];

await page.goto("https://news.ycombinator.com");

// Create and execute agent
const agent = stagehand.agent();

const result = await agent.execute(
  "Find the top story and click on it"
);

if (result.success) {
  console.log("Task completed:", result.message);
  console.log("Actions taken:", result.actions.length);
} else {
  console.error("Task failed:", result.message);
}

With Custom Model

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  executionModel: "google/gemini-2.0-flash", // Fast model for tool execution
});

const result = await agent.execute({
  instruction: "Search for 'web scraping' and extract the first 5 results",
  maxSteps: 15,
});

Streaming Mode

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true, // Enable streaming
});

const agentRun = await agent.execute(
  "Go to Amazon and search for 'laptop'"
);

// Stream text output
for await (const delta of agentRun.textStream) {
  process.stdout.write(delta);
}

// Wait for final result
const result = await agentRun.result;
console.log("\nFinal result:", result);

With Custom Output Schema

import { z } from "zod";

const agent = stagehand.agent();

const result = await agent.execute({
  instruction: "Find the cheapest laptop on this page",
  output: z.object({
    name: z.string().describe("Product name"),
    price: z.string().describe("Product price"),
    rating: z.number().describe("Product rating out of 5"),
  }),
});

if (result.output) {
  console.log(`Found: ${result.output.name}`);
  console.log(`Price: ${result.output.price}`);
  console.log(`Rating: ${result.output.rating}/5`);
}

Conversation Continuation

const agent = stagehand.agent();

// First task
const result1 = await agent.execute(
  "Go to GitHub and search for 'stagehand'"
);

// Continue the conversation
const result2 = await agent.execute({
  instruction: "Now click on the first repository",
  messages: result1.messages, // Continue from previous state
});

// Another continuation
const result3 = await agent.execute({
  instruction: "Read the README and summarize it",
  messages: result2.messages,
});

With Variables

const agent = stagehand.agent();

await page.goto("https://example.com/login");

const result = await agent.execute({
  instruction: "Log in using the provided credentials",
  variables: {
    username: {
      value: process.env.USERNAME,
      description: "User's email address",
    },
    password: {
      value: process.env.PASSWORD,
      description: "User's password",
    },
  },
});

With Tool Exclusions

const agent = stagehand.agent();

const result = await agent.execute({
  instruction: "Navigate to the product page and click buy",
  excludeTools: ["screenshot", "extract"], // Faster execution
});

With Abort Signal

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 60000); // 1 minute

try {
  const result = await agent.execute({
    instruction: "Complete the checkout process",
    signal: controller.signal,
  });
  clearTimeout(timeoutId);
} catch (error) {
  if (error instanceof AgentAbortError) {
    console.log("Agent was aborted");
  }
}

Hybrid Mode (Coordinate-Based)

const agent = stagehand.agent({
  mode: "hybrid", // Use coordinate-based tools
  model: "google/gemini-2.0-flash",
});

await page.goto("https://example.com");

const result = await agent.execute({
  instruction: "Click on the blue button in the top right",
  highlightCursor: true, // Show cursor movements
});

CUA Mode (Computer Use Agent)

const agent = stagehand.agent({
  mode: "cua",
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const result = await agent.execute(
  "Navigate to the settings page and enable dark mode"
);

With Callbacks

const agent = stagehand.agent();

const result = await agent.execute({
  instruction: "Search for products and add to cart",
  callbacks: {
    onStepFinish: async (step) => {
      console.log("Step completed:", step.finishReason);
      if (step.toolCalls) {
        step.toolCalls.forEach((call) => {
          console.log(`Tool: ${call.toolName}`);
        });
      }
    },
  },
});

Agent Modes

DOM Mode (Default)

Best for structured page interactions. Available tools:
  • act - Semantic actions (click, type)
  • fillForm - Fill form fields
  • ariaTree - Get accessibility tree
  • extract - Extract data
  • goto - Navigate to URL
  • scroll - Scroll with semantic directions
  • keys - Press keyboard keys
  • navback - Navigate back
  • screenshot - Take screenshot
  • think - Agent reasoning
  • wait - Wait for time/condition
  • done - Mark task complete
  • search - Web search (requires BRAVE_API_KEY)

Hybrid Mode

Best for visual/screenshot-based interactions. Available tools:
  • click - Click at coordinates
  • type - Type at coordinates
  • dragAndDrop - Drag between points
  • clickAndHold - Click and hold
  • fillFormVision - Fill forms using vision
  • Plus all DOM mode tools

CUA Mode

Uses provider’s native computer use capabilities. Supported models:
  • openai/computer-use-preview
  • anthropic/claude-sonnet-4-5-20250929
  • google/gemini-2.5-computer-use-preview-10-2025
  • And more - see documentation

Best Practices

  1. Clear instructions - Be specific about the goal
    // Good
    await agent.execute(
      "Find the product with the lowest price and add it to cart"
    );
    
    // Too vague
    await agent.execute("buy something");
    
  2. Set appropriate maxSteps - Prevent runaway executions
    await agent.execute({
      instruction: "...",
      maxSteps: 10, // Simple task
    });
    
  3. Use output schemas - Get structured data
    await agent.execute({
      instruction: "...",
      output: z.object({ ... }),
    });
    
  4. Handle errors gracefully
    const result = await agent.execute(instruction);
    
    if (!result.success) {
      console.error("Failed:", result.message);
      // Retry or handle error
    }
    
  5. Use variables for sensitive data
    await agent.execute({
      instruction: "Log in with credentials",
      variables: { 
        username: process.env.USER,
        password: process.env.PASS 
      },
    });
    
  6. Monitor with callbacks
    await agent.execute({
      instruction: "...",
      callbacks: {
        onStepFinish: (step) => logStep(step),
      },
    });
    

Error Handling

try {
  const result = await agent.execute(instruction);
  
  if (!result.success) {
    console.error("Agent failed:", result.message);
  }
} catch (error) {
  if (error instanceof AgentAbortError) {
    console.log("Agent was aborted");
  } else if (error instanceof StreamingCallbacksInNonStreamingModeError) {
    console.error("Invalid callback usage");
  } else {
    console.error("Unexpected error:", error);
  }
}

Performance Tips

  1. Use faster models for execution
    agent({
      model: "anthropic/claude-sonnet-4-5-20250929", // Reasoning
      executionModel: "google/gemini-2.0-flash", // Fast tools
    })
    
  2. Exclude unnecessary tools
    execute({
      instruction: "...",
      excludeTools: ["screenshot", "extract"],
    })
    
  3. Set reasonable maxSteps
    execute({ instruction: "...", maxSteps: 10 })
    
  4. Use conversation continuation - Reuse context
    const result1 = await agent.execute("First task");
    const result2 = await agent.execute({
      instruction: "Next task",
      messages: result1.messages,
    });
    

Build docs developers (and LLMs) love