Context Limits
Context limits control how much information from linked documents is included when using Enhanced Actions (RAG). Choosing the right limit is crucial for balancing answer quality, performance, and cost.Available Presets
Local GPT offers four context limit presets, each designed for different use cases:Local models
10,000 charactersOptimized for local models with smaller context windows (e.g., 8K-32K tokens).
Cloud models
32,000 charactersSuitable for standard cloud models with medium context windows (32K-64K tokens).
Top: GPT, Claude, Gemini
100,000 charactersFor advanced models with large context windows (100K+ tokens).
No limits (danger)
3,000,000 charactersEffectively unlimited. Use with extreme caution.
Configuration
Context limits are configured globally in the plugin settings:The setting is stored in
settings.defaults.contextLimit with values: "local", "cloud", "advanced", or "max"Implementation Details
The context limit is resolved insrc/main.ts:783-796:
src/rag.ts:368-390:
How to Choose the Right Preset
Local Models (10K characters)
Local Models (10K characters)
Best for:
- Ollama models (Llama 3, Mistral, Gemma, etc.)
- LM Studio
- Other local inference servers
- Models with 8K-32K token context windows
- Most local models have limited context windows
- Prevents out-of-memory errors
- Maintains fast inference speed
- Focuses on only the most relevant chunks
Cloud Models (32K characters)
Cloud Models (32K characters)
Best for:
- GPT-3.5-turbo
- Claude 3 Haiku
- Gemini 1.5 Flash
- Standard API models
- Balances context richness with cost
- Fits comfortably in most cloud model windows
- Good performance/quality trade-off
Advanced Models (100K characters)
Advanced Models (100K characters)
Best for:
- GPT-4 Turbo (128K context)
- Claude 3 Opus/Sonnet (200K context)
- Gemini 1.5 Pro (1M+ context)
- Specialized long-context models
- Leverages extended context capabilities
- Provides rich, comprehensive context
- Enables complex reasoning across many documents
No Limits / Max (3M characters)
No Limits / Max (3M characters)
Best for:
- Extreme edge cases
- Testing and development
- Models with multi-million token contexts
- Essentially removes the limit
- Includes all retrieved chunks
Impact on Performance
Quality
- Too Small
- Optimal
- Too Large
Symptoms:
- AI lacks necessary context
- Answers are generic or incomplete
- Important linked information is missed
Speed
Processing time increases with context size:Processing time includes:
- Document extraction
- Chunking
- Embedding generation
- Vector search
- API request time
Cost
For paid APIs, token usage directly impacts cost:| Preset | Est. Tokens | Cost Multiplier |
|---|---|---|
| Local (10K) | ~2.5K | 1x (baseline) |
| Cloud (32K) | ~8K | ~3x |
| Advanced (100K) | ~25K | ~10x |
| Max (3M) | Variable | Up to 300x+ |
Monitoring Context Usage
In development mode, Local GPT logs context statistics:NODE_ENV=development in your build.
Advanced: When Context is Truncated
When the context limit is reached:- Graceful Truncation: The system stops adding chunks mid-file if needed
- No Partial Chunks: Individual chunks are never split
- Highest Scores First: Within each file group, highest-scoring chunks are prioritized
- File Order Preserved: Newer files (by creation time) are processed first
- File A (created today): 15K characters, all included
- File B (yesterday): 12K characters, all included
- File C (last week): 8K characters, 5K included, 3K truncated
- File D & E: Excluded entirely
Migration from Previous Versions
If you upgraded from an earlier version, your context limit is set to"local" (10K) by default. This migration happens in src/main.ts:1102-1112:
Best Practices
Start Conservative
Begin with “Local models” or “Cloud models” and increase only if needed.
Match Your Model
Choose the preset that matches your AI model’s context window.
Monitor Quality
If answers lack context, increase the limit. If they’re unfocused, decrease it.
Consider Cost
Higher limits = more tokens = higher costs for paid APIs.
Next Steps
RAG System
Learn how the RAG system processes and ranks context
Troubleshooting
Fix issues with context limits and embedding