Skip to main content

Overview

Stagehand’s AI-powered operations can incur LLM costs. This guide covers strategies to optimize costs while maintaining reliability.

Use Caching

Stagehand includes built-in caching that drastically reduces LLM costs for repeated operations.

Enable Action Caching

Cache frequently repeated actions:
const stagehand = new Stagehand({
  env: "LOCAL",
  cacheDir: "./stagehand-cache", // Enable caching
  verbose: 1
});
How it works:
  • First execution: Full LLM inference
  • Subsequent executions: Replay cached selectors (0 tokens)
  • Self-healing: Auto-updates cache if DOM changes

Cache Benefits

// First run: ~1000 tokens
await stagehand.act("click the login button");

// Second run: 0 tokens (replays cached selector)
await stagehand.act("click the login button");
Caching is based on instruction + URL + page state, so changes to any of these will require a fresh LLM call.

Choose Cost-Effective Models

Model Selection Strategy

// For simple interactions: use mini models
const stagehand = new Stagehand({
  env: "LOCAL",
  model: "openai/gpt-4.1-mini", // 70% cheaper than gpt-4
});

// For complex tasks: use standard models
const complexStagehand = new Stagehand({
  env: "LOCAL",
  model: "anthropic/claude-sonnet-4",
});

Model Cost Comparison

ModelInput CostOutput CostBest For
openai/gpt-4.1-mini$0.15/1M$0.60/1MSimple clicks, forms
openai/gpt-4.1$2.50/1M$10/1MComplex reasoning
anthropic/claude-haiku-4-5$0.80/1M$4/1MFast, cost-effective
anthropic/claude-sonnet-4$3/1M$15/1MAdvanced tasks
Start with mini models and only upgrade if you encounter reliability issues.

Reduce Token Usage

Use Specific Selectors

Reduce DOM complexity by being specific:
// ❌ High token usage (processes entire page)
await stagehand.extract("get all product data");

// ✅ Lower token usage (focused extraction)
await stagehand.extract(
  "get the price and title from the product card",
  { page } // Uses focused DOM processing
);

Limit Agent Steps

Set realistic maxSteps limits:
const agent = stagehand.agent();

// ❌ Wasteful: allows unnecessary exploration
await agent.execute({
  instruction: "click the login button",
  maxSteps: 50 // Overkill for a simple task
});

// ✅ Cost-effective: right-sized limit
await agent.execute({
  instruction: "click the login button",
  maxSteps: 3 // Sufficient for this task
});

Disable Verbose Logging

Reduce API overhead:
const stagehand = new Stagehand({
  env: "LOCAL",
  verbose: 0, // Minimal logging
  logInferenceToFile: false, // Don't persist logs
});

Batch Operations

Group related actions to reduce round trips:
// ❌ Multiple LLM calls
await stagehand.act("type 'john' in first name");
await stagehand.act("type 'doe' in last name");
await stagehand.act("type '[email protected]' in email");

// ✅ Single LLM call with observe
const actions = await stagehand.observe(
  "return actions to fill the form: first name 'john', last name 'doe', email '[email protected]'"
);
for (const action of actions) {
  await stagehand.act(action);
}

Use Deterministic Methods

When possible, use Playwright’s native methods:
// ✅ No LLM cost: direct Playwright API
const page = stagehand.context.pages()[0];
await page.click("button[type='submit']");
await page.fill("#email", "[email protected]");

// Only use Stagehand when you need AI
await stagehand.act("click the 'Add to Cart' button");

Monitor Usage

Track token consumption:
const result = await stagehand.extract("get product details");

console.log("Tokens used:", result.usage);
// {
//   input_tokens: 1234,
//   output_tokens: 456,
//   cached_input_tokens: 0
// }

Environment-Specific Optimization

Development

const stagehand = new Stagehand({
  env: "LOCAL",
  cacheDir: "./dev-cache", // Aggressive caching
  verbose: 2, // Full logging for debugging
  model: "openai/gpt-4.1-mini", // Cheap model
});

Production

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  cacheDir: "./prod-cache", // Cache for repeated workflows
  verbose: 0, // Minimal logging
  model: "anthropic/claude-sonnet-4", // Reliable model
});

Cost Estimation

Typical token usage:
  • act(): 500-2,000 tokens per call
  • extract(): 1,000-3,000 tokens per call
  • observe(): 800-2,500 tokens per call
  • agent.execute(): 3,000-10,000+ tokens (depends on steps)
Example: 1,000 cached actions with GPT-4.1-mini:
  • Without cache: 1M tokens = $0.75
  • With cache: 1,000 tokens (first run) = $0.001
  • Savings: 99.9%
Always test caching thoroughly before production use. Cache invalidation happens automatically when URLs or DOM structure changes significantly.

Build docs developers (and LLMs) love