Overview
Stagehand’s AI-powered operations can incur LLM costs. This guide covers strategies to optimize costs while maintaining reliability.
Use Caching
Stagehand includes built-in caching that drastically reduces LLM costs for repeated operations.
Enable Action Caching
Cache frequently repeated actions:
const stagehand = new Stagehand({
env: "LOCAL",
cacheDir: "./stagehand-cache", // Enable caching
verbose: 1
});
How it works:
- First execution: Full LLM inference
- Subsequent executions: Replay cached selectors (0 tokens)
- Self-healing: Auto-updates cache if DOM changes
Cache Benefits
// First run: ~1000 tokens
await stagehand.act("click the login button");
// Second run: 0 tokens (replays cached selector)
await stagehand.act("click the login button");
Caching is based on instruction + URL + page state, so changes to any of these will require a fresh LLM call.
Choose Cost-Effective Models
Model Selection Strategy
// For simple interactions: use mini models
const stagehand = new Stagehand({
env: "LOCAL",
model: "openai/gpt-4.1-mini", // 70% cheaper than gpt-4
});
// For complex tasks: use standard models
const complexStagehand = new Stagehand({
env: "LOCAL",
model: "anthropic/claude-sonnet-4",
});
Model Cost Comparison
| Model | Input Cost | Output Cost | Best For |
|---|
openai/gpt-4.1-mini | $0.15/1M | $0.60/1M | Simple clicks, forms |
openai/gpt-4.1 | $2.50/1M | $10/1M | Complex reasoning |
anthropic/claude-haiku-4-5 | $0.80/1M | $4/1M | Fast, cost-effective |
anthropic/claude-sonnet-4 | $3/1M | $15/1M | Advanced tasks |
Start with mini models and only upgrade if you encounter reliability issues.
Reduce Token Usage
Use Specific Selectors
Reduce DOM complexity by being specific:
// ❌ High token usage (processes entire page)
await stagehand.extract("get all product data");
// ✅ Lower token usage (focused extraction)
await stagehand.extract(
"get the price and title from the product card",
{ page } // Uses focused DOM processing
);
Limit Agent Steps
Set realistic maxSteps limits:
const agent = stagehand.agent();
// ❌ Wasteful: allows unnecessary exploration
await agent.execute({
instruction: "click the login button",
maxSteps: 50 // Overkill for a simple task
});
// ✅ Cost-effective: right-sized limit
await agent.execute({
instruction: "click the login button",
maxSteps: 3 // Sufficient for this task
});
Disable Verbose Logging
Reduce API overhead:
const stagehand = new Stagehand({
env: "LOCAL",
verbose: 0, // Minimal logging
logInferenceToFile: false, // Don't persist logs
});
Batch Operations
Group related actions to reduce round trips:
// ❌ Multiple LLM calls
await stagehand.act("type 'john' in first name");
await stagehand.act("type 'doe' in last name");
await stagehand.act("type '[email protected]' in email");
// ✅ Single LLM call with observe
const actions = await stagehand.observe(
"return actions to fill the form: first name 'john', last name 'doe', email '[email protected]'"
);
for (const action of actions) {
await stagehand.act(action);
}
Use Deterministic Methods
When possible, use Playwright’s native methods:
// ✅ No LLM cost: direct Playwright API
const page = stagehand.context.pages()[0];
await page.click("button[type='submit']");
await page.fill("#email", "[email protected]");
// Only use Stagehand when you need AI
await stagehand.act("click the 'Add to Cart' button");
Monitor Usage
Track token consumption:
const result = await stagehand.extract("get product details");
console.log("Tokens used:", result.usage);
// {
// input_tokens: 1234,
// output_tokens: 456,
// cached_input_tokens: 0
// }
Environment-Specific Optimization
Development
const stagehand = new Stagehand({
env: "LOCAL",
cacheDir: "./dev-cache", // Aggressive caching
verbose: 2, // Full logging for debugging
model: "openai/gpt-4.1-mini", // Cheap model
});
Production
const stagehand = new Stagehand({
env: "BROWSERBASE",
cacheDir: "./prod-cache", // Cache for repeated workflows
verbose: 0, // Minimal logging
model: "anthropic/claude-sonnet-4", // Reliable model
});
Cost Estimation
Typical token usage:
act(): 500-2,000 tokens per call
extract(): 1,000-3,000 tokens per call
observe(): 800-2,500 tokens per call
agent.execute(): 3,000-10,000+ tokens (depends on steps)
Example: 1,000 cached actions with GPT-4.1-mini:
- Without cache: 1M tokens = $0.75
- With cache: 1,000 tokens (first run) = $0.001
- Savings: 99.9%
Always test caching thoroughly before production use. Cache invalidation happens automatically when URLs or DOM structure changes significantly.
Related