Context Strategies
Context strategies determine how conversation history is managed when it exceeds the model’s context window. They ensure your LLM doesn’t run out of memory while preserving the most relevant conversation context.Why Context Strategies?
LLMs have a maximum context length (e.g., 2048, 4096, or 8192 tokens). When your conversation grows beyond this limit, you need a strategy to:- Trim old messages while keeping recent context
- Preserve important messages (like system prompts)
- Maintain conversation coherence
- Prevent out-of-memory errors
Available Strategies
React Native ExecuTorch provides three built-in context strategies:NoopContextStrategy
No filtering or trimming - uses the entire message history as-is.- You manually manage conversation length
- You have very short conversations
- You’re certain the context won’t exceed limits
- Prepends system prompt to the message history
- No message removal or filtering
- Ignores
maxContextLengthand token counts
MessageCountContextStrategy
Retains a fixed number of the most recent messages.windowLength: Maximum number of recent messages to keep (default: 5)
- You want simple, predictable context management
- Message length is relatively uniform
- You need fast, token-count-free trimming
- Keeps the last
windowLengthmessages - Removes older messages beyond the window
- System prompt is always included
- Does not consider actual token count
SlidingWindowContextStrategy (Recommended)
Dynamically trims messages based on actual token count to fit within the model’s context window.bufferTokens: Number of tokens to reserve for model generation (e.g., 1000)allowOrphanedAssistantMessages: Whether to allow assistant responses without their preceding user message
- You want optimal context utilization (recommended for most cases)
- Messages vary in length
- You want to prevent context overflow errors
- You need to maximize context usage while leaving room for generation
- Calculates exact token count of formatted messages
- Removes oldest messages until tokens fit within:
maxContextLength - bufferTokens - Optionally preserves user-assistant message pairs
- System prompt is always included
Comparison
| Strategy | Token-Aware | Preserves Pairs | Complexity | Best For |
|---|---|---|---|---|
| NoopContextStrategy | No | N/A | O(1) | Manual management, short conversations |
| MessageCountContextStrategy | No | No | O(1) | Simple apps, uniform messages |
| SlidingWindowContextStrategy | Yes | Optional | O(n) | Production apps, optimal context usage |
Implementation Details
Context Strategy Interface
All strategies implement this interface:systemPrompt: The system instructions for the modelhistory: Complete conversation historymaxContextLength: Maximum tokens the model can handlegetTokenCount: Callback to calculate token count of messages
- Array of messages optimized for the context window
Orphaned Assistant Messages
WhenallowOrphanedAssistantMessages is false in SlidingWindowContextStrategy, the strategy ensures:
Practical Examples
Short Conversations
Simple Chat App
Production App (Recommended)
Long-Context Model
Custom Context Strategy
You can implement your own strategy by implementing theContextStrategy interface:
Best Practices
- Use SlidingWindowContextStrategy for production - It provides the most reliable context management
- Set appropriate buffer tokens - Reserve enough tokens for the model’s response (1000-2000 is typical)
- Consider conversation patterns - Set
allowOrphanedAssistantMessages: falseto preserve Q&A pairs - Monitor token usage - Use
getTotalTokenCount()to understand your token consumption - Test with long conversations - Ensure your strategy handles extended conversations gracefully