Cost Optimization

Overview

Stagehand’s AI-powered operations can incur LLM costs. This guide covers strategies to optimize costs while maintaining reliability.

Use Caching

Stagehand includes built-in caching that drastically reduces LLM costs for repeated operations.

Enable Action Caching

Cache frequently repeated actions:

const stagehand = new Stagehand({
  env: "LOCAL",
  cacheDir: "./stagehand-cache", // Enable caching
  verbose: 1
});

How it works:

First execution: Full LLM inference
Subsequent executions: Replay cached selectors (0 tokens)
Self-healing: Auto-updates cache if DOM changes

Cache Benefits

// First run: ~1000 tokens
await stagehand.act("click the login button");

// Second run: 0 tokens (replays cached selector)
await stagehand.act("click the login button");

Caching is based on instruction + URL + page state, so changes to any of these will require a fresh LLM call.

Choose Cost-Effective Models

Model Selection Strategy

// For simple interactions: use mini models
const stagehand = new Stagehand({
  env: "LOCAL",
  model: "openai/gpt-4.1-mini", // 70% cheaper than gpt-4
});

// For complex tasks: use standard models
const complexStagehand = new Stagehand({
  env: "LOCAL",
  model: "anthropic/claude-sonnet-4",
});

Model Cost Comparison

Model	Input Cost	Output Cost	Best For
`openai/gpt-4.1-mini`	$0.15/1M	$0.60/1M	Simple clicks, forms
`openai/gpt-4.1`	$2.50/1M	$10/1M	Complex reasoning
`anthropic/claude-haiku-4-5`	$0.80/1M	$4/1M	Fast, cost-effective
`anthropic/claude-sonnet-4`	$3/1M	$15/1M	Advanced tasks

Start with mini models and only upgrade if you encounter reliability issues.

Reduce Token Usage

Use Specific Selectors

Reduce DOM complexity by being specific:

// ❌ High token usage (processes entire page)
await stagehand.extract("get all product data");

// ✅ Lower token usage (focused extraction)
await stagehand.extract(
  "get the price and title from the product card",
  { page } // Uses focused DOM processing
);

Limit Agent Steps

Set realistic maxSteps limits:

const agent = stagehand.agent();

// ❌ Wasteful: allows unnecessary exploration
await agent.execute({
  instruction: "click the login button",
  maxSteps: 50 // Overkill for a simple task
});

// ✅ Cost-effective: right-sized limit
await agent.execute({
  instruction: "click the login button",
  maxSteps: 3 // Sufficient for this task
});

Disable Verbose Logging

Reduce API overhead:

const stagehand = new Stagehand({
  env: "LOCAL",
  verbose: 0, // Minimal logging
  logInferenceToFile: false, // Don't persist logs
});

Batch Operations

Group related actions to reduce round trips:

// ❌ Multiple LLM calls
await stagehand.act("type 'john' in first name");
await stagehand.act("type 'doe' in last name");
await stagehand.act("type '[email protected]' in email");

// ✅ Single LLM call with observe
const actions = await stagehand.observe(
  "return actions to fill the form: first name 'john', last name 'doe', email '[email protected]'"
);
for (const action of actions) {
  await stagehand.act(action);
}

Use Deterministic Methods

When possible, use Playwright’s native methods:

// ✅ No LLM cost: direct Playwright API
const page = stagehand.context.pages()[0];
await page.click("button[type='submit']");
await page.fill("#email", "[email protected]");

// Only use Stagehand when you need AI
await stagehand.act("click the 'Add to Cart' button");

Monitor Usage

Track token consumption:

const result = await stagehand.extract("get product details");

console.log("Tokens used:", result.usage);
// {
//   input_tokens: 1234,
//   output_tokens: 456,
//   cached_input_tokens: 0
// }

Environment-Specific Optimization

Development

const stagehand = new Stagehand({
  env: "LOCAL",
  cacheDir: "./dev-cache", // Aggressive caching
  verbose: 2, // Full logging for debugging
  model: "openai/gpt-4.1-mini", // Cheap model
});

Production

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  cacheDir: "./prod-cache", // Cache for repeated workflows
  verbose: 0, // Minimal logging
  model: "anthropic/claude-sonnet-4", // Reliable model
});

Cost Estimation

Typical token usage:

act(): 500-2,000 tokens per call
extract(): 1,000-3,000 tokens per call
observe(): 800-2,500 tokens per call
agent.execute(): 3,000-10,000+ tokens (depends on steps)

Example: 1,000 cached actions with GPT-4.1-mini:

Without cache: 1M tokens = $0.75
With cache: 1,000 tokens (first run) = $0.001
Savings: 99.9%

Always test caching thoroughly before production use. Cache invalidation happens automatically when URLs or DOM structure changes significantly.

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

Overview

Use Caching

Enable Action Caching

Cache Benefits

Choose Cost-Effective Models

Model Selection Strategy

Model Cost Comparison

Reduce Token Usage

Use Specific Selectors

Limit Agent Steps

Disable Verbose Logging

Batch Operations

Use Deterministic Methods

Monitor Usage

Environment-Specific Optimization

Development

Production

Cost Estimation

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Core Methods

Configuration

Integrations

Best Practices

Advanced Features

​Overview

​Use Caching

​Enable Action Caching

​Cache Benefits

​Choose Cost-Effective Models

​Model Selection Strategy

​Model Cost Comparison

​Reduce Token Usage

​Use Specific Selectors

​Limit Agent Steps

​Disable Verbose Logging

​Batch Operations

​Use Deterministic Methods

​Monitor Usage

​Environment-Specific Optimization

​Development

​Production

​Cost Estimation

​Related

Build docs developers (and LLMs) love

Overview

Use Caching

Enable Action Caching

Cache Benefits

Choose Cost-Effective Models

Model Selection Strategy

Model Cost Comparison

Reduce Token Usage

Use Specific Selectors

Limit Agent Steps

Disable Verbose Logging

Batch Operations

Use Deterministic Methods

Monitor Usage

Environment-Specific Optimization

Development

Production

Cost Estimation

Related