Token Caching
Qwen Code supports prompt caching (also called context caching) to significantly reduce costs and latency when working with large, repetitive contexts like codebases, documentation, and long conversations.Overview
Prompt caching allows LLM providers to store and reuse portions of the prompt context, reducing:- Costs: Cached tokens are charged at a fraction of the price (typically 10% of input token costs)
- Latency: Cached content doesn’t need to be reprocessed
- Token usage: Significantly reduces effective prompt token consumption
Supported Providers
- Anthropic (Claude): Full support with ephemeral caching
- OpenAI: Support for prompt caching via
cached_tokensfield - Google (Gemini): Limited support depending on model
How It Works
Anthropic Prompt Caching
Qwen Code implements Anthropic’s prompt caching by addingcache_control markers to specific parts of the prompt.
System Instruction Caching
Frompackages/core/src/core/anthropicContentGenerator/converter.ts:61:
Tool Definition Caching
Fromconverter.ts:123:
Message History Caching
Fromconverter.ts:549:
OpenAI Prompt Caching
OpenAI’s prompt caching is tracked via usage metadata. Frompackages/core/src/core/openaiContentGenerator/converter.ts:356:
converter.ts:891:
Token Counting
Cached tokens are tracked separately in telemetry. Frompackages/core/src/telemetry/types.ts:320:
packages/core/src/telemetry/uiTelemetry.ts:189:
Configuration
Enable Caching
Caching is typically enabled automatically for supported models. The configuration happens during content generator initialization.Anthropic Configuration
Cache Behavior
From Anthropic’s documentation:- Cache Duration: 5 minutes of inactivity
- Cache Key: Based on exact prompt content
- Minimum Size: 1024 tokens for caching to activate
- Cost: ~10% of input token cost for cache hits
Monitoring Cache Performance
Qwen Code tracks caching effectiveness through telemetry.Cache Hit Rate
Frompackages/cli/src/ui/utils/computeStats.ts:31:
Total Cached Tokens
FromcomputeStats.ts:50:
Viewing Cache Metrics
Optimizing for Cache Performance
1. Structure Prompts for Caching
Best Practice: Place stable context first, variable content last.2. Maximize Cache Window
Keep sessions active to maintain cache:3. Chat Compression with Caching
Frompackages/core/src/services/sessionService.ts:584:
4. Memory System Integration
Frompackages/core/src/utils/memoryDiscovery.ts, hierarchical memory files are loaded and can benefit from caching:
Resume and Token Restoration
When resuming sessions, token counts are restored from checkpoints. FromsessionService.ts:670:
sessionService.ts:676:
Cost Analysis
Example Savings
Typical Anthropic pricing (Claude 3.5 Sonnet):JSON Output Format
Frompackages/cli/src/nonInteractive/io/BaseJsonOutputAdapter.ts:212:
Advanced Topics
Cache Invalidation
Cache is invalidated when:- Content changes: Any modification to cached portions
- 5-minute timeout: Inactivity exceeds cache duration
- Context length: Cache size limits reached
- Model changes: Switching to different model
Multi-Turn Caching Strategy
For long conversations:Breakpoints
Anthropic supports up to 4 cache breakpoints:- End of system instructions
- End of tool definitions
- End of most recent cached message
- Custom location (if needed)
Troubleshooting
Low Cache Hit Rate
Problem: Cache hit rate below 50% Causes:- Prompt structure changes between requests
- Short session duration (cache expires)
- Dynamic content in system prompt
- Context size below 1024 token minimum
Cache Not Activating
Problem:cached_content_token_count always 0
Causes:
- Context size below 1024 tokens
- Caching not enabled for model
- Provider doesn’t support caching
Unexpected Cache Misses
Problem: Cache misses despite identical prompts Causes:- Whitespace differences
- Tool order changes
- Hidden formatting differences
Source Code References
- Anthropic caching:
packages/core/src/core/anthropicContentGenerator/converter.ts:61,123,529,549 - OpenAI caching:
packages/core/src/core/openaiContentGenerator/converter.ts:356,891 - Telemetry:
packages/core/src/telemetry/types.ts:320,uiTelemetry.ts:189 - Stats computation:
packages/cli/src/ui/utils/computeStats.ts:31,50 - Session restore:
packages/core/src/services/sessionService.ts:670,676
