What counts toward context?
Every token in a Cline session consumes context:- Conversation history — every message you and Cline have exchanged in the current task
- File contents — full contents of any file Cline has read
- Tool outputs — results from terminal commands, search results, test output
- System prompts — Cline’s internal instructions (relatively small, a few thousand tokens)
Token estimates
| Content type | Approximate tokens per KB |
|---|---|
| Source code | 250–400 |
| JSON | 300–500 |
| Markdown | 200–300 |
| Plain text | 200–250 |
Context window sizes by model
| Model | Context window | Practical range | Notes |
|---|---|---|---|
| Claude Sonnet 4.5 | 200K (1M variant available) | ~100K–500K | Best quality at high context |
| Claude Opus 4 | 200K (1M variant available) | ~100K–500K | Most capable reasoning |
| GPT-5 | 400K | ~200K–300K | Three performance modes |
| GPT-4o | 128K | ~80K | Multimodal, fast |
| Gemini 2.5 Pro | 1M+ | ~600K | Excellent document handling |
| Gemini 1.5 Pro | 2M | ~1M | Largest available |
| DeepSeek V3 | 128K | ~100K | Best at mid-range context |
| Qwen3 Coder | 256K | ~200K | Good balance |
| Qwen3 Coder 30B (local) | 256K | ~150K | Depends on num_ctx setting |
“Practical range” reflects where models typically maintain high-quality, coherent outputs. All models degrade to some degree near their hard limit — plan for the practical range, not the ceiling.
Why context windows matter for coding
Coding tasks are context-heavy by nature:- Multi-file features require Cline to read several files before writing a single line
- Debugging sessions accumulate error messages, stack traces, and attempted fixes
- Refactoring means holding the current state of the code alongside the target state
- Large codebases have deeply interconnected files where understanding one requires reading several others
How Cline manages context
Cline includes several mechanisms to help you stay within limits:Context meter
The Cline interface shows a token usage indicator for the current task. Watch this as your session grows — when it approaches 80% of the model’s limit, consider compacting or starting a new task.Auto-compact
Cline can automatically summarize long conversations to free up context while preserving the essential information:- Go to Cline Settings → Features.
- Enable Auto-compact.
Selective file reading
Cline reads files only when needed. Instead of loading entire directories, it reads specific files in response to your requests or when it determines a file is relevant. You can influence this with@ mentions:
@filename.ts— include a specific file when you know it’s relevant- Ask Cline to search for a function rather than reading the whole file
- Reference specific line numbers when discussing a bug
Starting fresh
For new features or unrelated tasks, start a new Cline task with/new in the chat. A fresh context means the model has full capacity and no irrelevant history.
Signs you’re hitting context limits
| Symptom | What it means | What to do |
|---|---|---|
| ”Context window exceeded” error | Hard limit reached | Start a new task or enable auto-compact |
| Suggestions contradict recent changes | Context overflow or truncation | Start fresh or compact the conversation |
| Repetitive or circular responses | Model losing coherence at high context | Summarize and continue in a new task |
| Missing recent edits | Earlier context dropped | Start a new task, reference key files explicitly |
| Noticeably slower responses | Model processing very large context | Reduce files included, or switch to a model with a larger window |
Practical tips by project size
Small projects (under 50 files)
Small projects (under 50 files)
Any model works well. You can freely include relevant files without hitting limits in a normal session. A 128K context window is sufficient.Recommended models: DeepSeek V3, GPT-4o, Claude Haiku
Medium projects (50–500 files)
Medium projects (50–500 files)
Use a model with at least 128K–200K context. Be selective about which files you load. Start new tasks when switching between unrelated features.Recommended models: Claude Sonnet 4.5, Qwen3 Coder, GPT-5Tips:
- Use search instead of reading entire directories
- Keep sessions focused on one feature or bug at a time
- Enable auto-compact
Large projects (500+ files)
Large projects (500+ files)
Use a model with 200K+ context, ideally 1M. Focus sessions on specific modules rather than the full codebase. Break large tasks into smaller, independently completable steps.Recommended models: Claude Sonnet 4.5 (1M variant), Gemini 2.5 ProTips:
- Reference specific functions or classes, not entire files
- Use Plan Mode to outline the approach before loading files in Act Mode
- Summarize completed work before moving to the next phase
- Consider using Gemini 2.5 Pro if you consistently need to load large amounts of code
Plan Mode and Act Mode
Cline’s Plan/Act mode split is useful for context management:- Plan Mode — discussion and reasoning, minimal file loading. Use a smaller, cheaper model.
- Act Mode — file reading and code writing. Use a model with a large, reliable context window.
Frequently asked questions
Why do responses get worse near the end of long sessions?
Why do responses get worse near the end of long sessions?
Models degrade near their hard context limit. The “effective window” where quality is high is typically 50–70% of the advertised limit. Beyond that, the model struggles to attend to all the information and may drop or confuse details from earlier in the session.
Should I always use the largest context window available?
Should I always use the largest context window available?
Not necessarily. Larger contexts increase cost and can subtly reduce quality because the model has more information to sort through. Match the context size to your task — a simple bug fix doesn’t need a 1M-token window.
What's the difference between the advertised and effective context window?
What's the difference between the advertised and effective context window?
The advertised context window is the hard technical limit. The effective window is where the model maintains high-quality, coherent responses. Models often start to lose coherence 30–50% before their hard limit. Check the table above for practical ranges.
How can I tell how much context I've used?
How can I tell how much context I've used?
Cline shows a token usage indicator in the chat interface for the current task. It updates as the session grows.
What happens when the context limit is exceeded?
What happens when the context limit is exceeded?
Cline will either:
- Automatically compact the conversation (if auto-compact is enabled)
- Show an error and prompt you to start a new task
- Truncate older messages with a warning
Next steps
Model selection guide
Find the right model for your context window needs and coding style.
Cloud providers
Set up API keys for providers with the largest context windows.