Overview
Caching strategies in Stagehand:- Prompt caching - Cache system prompts and static content
- Image compression - Reduce token usage in conversation history
- Conversation management - Maintain context while minimizing tokens
- Provider-specific optimizations - Leverage native caching features
Prompt Caching
Anthropic Prompt Caching
Anthropic supports caching with cache control blocks. Stagehand automatically uses this for system prompts and accessibility trees. How it works:- System prompts are cached across requests
- Reduces input token costs by ~90% for cached content
- Cache persists for 5 minutes of inactivity
- Particularly effective for accessibility trees
OpenAI Prompt Caching
OpenAI does not currently support explicit prompt caching, but Stagehand optimizes requests by:- Reusing system prompts across calls
- Minimizing message history
- Structuring requests for potential future caching support
Google Prompt Caching
Google’s caching is handled automatically by the model. Stagehand optimizes by:- Structuring system instructions consistently
- Reusing conversation history format
- Minimizing changes to cached content
Image Compression
Anthropic Image Compression
Location:packages/core/lib/v3/agent/utils/imageCompression.ts
Strategy:
- Keep first 2 images in conversation at full quality
- Compress all subsequent images to 25% quality
- Reduces token usage while maintaining context
Google Image Compression
Location:packages/core/lib/v3/agent/utils/imageCompression.ts
Implementation:
Conversation History Management
CUA Conversation History
All CUA clients maintain conversation history to preserve context: Anthropic Pattern:History Truncation Strategies
Keep recent messages:Provider-Specific Optimizations
Anthropic Cache Control
Google Content Reuse
OpenAI Response Chaining
Performance Monitoring
Track Token Usage
Log Compression Results
Best Practices
- Use prompt caching: Mark static content with cache_control
- Compress images: Keep first 2 at full quality, compress rest
- Truncate history: Don’t let conversation grow unbounded
- Monitor token usage: Track input/output/cached tokens
- Structure consistently: Consistent structure improves caching
- Batch operations: Fewer requests = better cache utilization
- Use appropriate models: Faster models for cached content
Cost Optimization
Example savings with caching:References
- Image Compression:
packages/core/lib/v3/agent/utils/imageCompression.ts - Anthropic CUA:
packages/core/lib/v3/agent/AnthropicCUAClient.ts:351 - Google CUA:
packages/core/lib/v3/agent/GoogleCUAClient.ts:357 - OpenAI CUA:
packages/core/lib/v3/agent/OpenAICUAClient.ts:420