Core principle
Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.Context engineering is the practice of actively managing what an AI model sees. It is not a one-time task — it is an iterative discipline applied every time you pass information to the model.
Context engineering vs. prompt engineering
These two practices are related but distinct:| Prompt engineering | Context engineering | |
|---|---|---|
| What it is | Writing and organizing LLM instructions for optimal outcomes | Curating and maintaining the optimal token set during inference |
| When it happens | Primarily a one-time authoring task | Iterative — happens every turn |
| What it manages | System instructions and their wording | System instructions, tools, external data, message history, runtime retrieval |
- System instructions
- Tools and tool schemas
- Model Context Protocol (MCP) connections
- External data and documents
- Message history
- Runtime data retrieved on demand
The problem: context rot
LLMs have a finite attention budget, and it degrades as context grows.- Every token attends to every other token — an n² relationship
- As context length increases, model accuracy on earlier content decreases
- Models have less training experience with very long sequences
- More context is not always better — it has diminishing (and eventually negative) marginal returns
System prompts: finding the right altitude
A well-crafted system prompt lives in a “Goldilocks zone” — specific enough to guide behavior, flexible enough to stay useful across edge cases.Too prescriptive
Hardcoded if-else logic. Brittle and fragile. High maintenance burden as the system evolves.
Too vague
High-level guidance without concrete signals. Falsely assumes shared context. Lacks actionable direction.
Just right
Specific enough to guide behavior effectively. Flexible enough to provide strong heuristics. Minimal but sufficient.
Best practices for system prompts
- Use simple, direct language
- Organize into distinct sections with XML tags or Markdown headers (e.g.,
<background_information>,<instructions>,## Tool guidance) - Start with a minimal prompt and add to it based on observed failure modes
- Note: “minimal” does not mean “short” — provide sufficient information upfront to prevent failures
Tools: minimal and clear
Tools add tokens to every request through their schemas and descriptions. Keep them precise.Design principles
- Self-contained: Each tool has a single, clear purpose
- Robust to error: Handle edge cases gracefully without crashing or producing ambiguous output
- Extremely clear: Intended use is unambiguous from the description alone
- Token-efficient: Returns relevant information without bloat
- Descriptive parameters: Use unambiguous input names (
user_idnotuser)
Common failure modes to avoid
- Bloated tool sets covering too much functionality in one schema
- Tools with overlapping purposes that force the model to guess
- Ambiguous parameter names that introduce uncertainty at call time
Examples: diverse, not exhaustive
When providing few-shot examples, curate quality over quantity.Do this
Curate a small set of diverse, canonical examples that show expected behavior effectively. Think of each example as a picture worth a thousand words.
Avoid this
Stuffing in an exhaustive list of edge cases. Trying to articulate every possible rule. Overwhelming the model with scenarios it won’t encounter.
Context retrieval strategies
How you retrieve data into the context window has a large impact on quality and cost.Just-in-time context (recommended for agents)
Just-in-time context (recommended for agents)
Approach: Maintain lightweight identifiers — file paths, queries, links — and dynamically load full data at runtime only when needed.Benefits:
- Avoids context pollution from irrelevant content
- Enables progressive disclosure patterns
- Mirrors human cognition (we don’t memorize everything ahead of time)
- Lets agents discover context incrementally, using metadata like file names and timestamps as signals
- Slower than pre-computed retrieval
- Requires clear tool guidance to prevent dead-ends
Pre-inference retrieval (traditional RAG)
Pre-inference retrieval (traditional RAG)
Approach: Use embedding-based retrieval to surface relevant context before inference begins.When to use: Static content that won’t change during the interaction, or when the retrieval query is well-defined and the result set is small.
Hybrid strategy (best of both)
Hybrid strategy (best of both)
Approach: Retrieve some data upfront for known requirements, and enable autonomous exploration on demand for unknown requirements.Example: Claude Code loads
CLAUDE.md files upfront at session start, then uses glob and grep tools for just-in-time retrieval as work proceeds.Rule of thumb: “Do the simplest thing that works.”Long-horizon tasks: three techniques
Extended interactions accumulate context. These three techniques help manage it across long-running tasks.Compaction
When a conversation is nearing the context limit, pass the message history to the model for compression, then reinitiate with the summary.Implementation:
- Preserve critical details: architectural decisions, bugs, implementation choices
- Discard redundant tool outputs and intermediate reasoning
- Continue with the compressed context plus recently accessed files
Structured note-taking (agentic memory)
Have the agent write notes that are persisted outside the context window and retrieved later as needed.Examples:
- To-do lists for multi-step tasks
NOTES.mdfiles for tracking progress- Game state or long-running process logs
Sub-agent architectures
Delegate focused tasks to specialized sub-agents that each have a clean, targeted context window.How it works:
- Main agent maintains the high-level plan and coordinates work
- Sub-agents perform deep technical work within their own context
- Sub-agents can explore extensively (tens of thousands of tokens internally)
- Sub-agents return condensed summaries (1,000–2,000 tokens) to the main agent
Quick decision framework
| Scenario | Recommended approach |
|---|---|
| Static content | Pre-inference retrieval or hybrid |
| Dynamic exploration needed | Just-in-time context |
| Extended back-and-forth | Compaction |
| Iterative development | Structured note-taking |
| Complex research | Sub-agent architectures |
| Rapid model improvement | ”Do the simplest thing that works” |
Anti-patterns to avoid
Key takeaways
- Context is finite: Treat it as a precious resource with a depletable attention budget.
- Think holistically: Consider the entire state available to the model, not just the system prompt.
- Stay minimal: More context is not always better — each token has a cost.
- Be iterative: Context curation happens every time you pass to the model, not just at setup.
- Design for autonomy: As models improve, let them act intelligently rather than over-specifying behavior.
- Start simple: Test with minimal setup, add complexity based on observed failure modes.
“Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents.”Context engineering will evolve, but the core principle stays the same: optimize the signal-to-noise ratio in your token budget.
How Claude Mem addresses these challenges
Claude Mem applies context engineering principles at the system level so individual sessions start with a curated, high-signal token set rather than a raw dump of history.- Progressive disclosure: The session start hook injects a compact index (~800 tokens) rather than full observation text, giving the agent information about what exists before it decides what to fetch. See Progressive disclosure.
- Just-in-time retrieval: MCP search tools (
search,timeline,get_observations) let the agent fetch full observation details on demand, paying the token cost only for content that’s actually relevant. - Semantic compression: Observations are stored with concise, searchable titles that compress hundreds of tokens of content into ~10 words — enough to make a retrieval decision without fetching.
- Structured note-taking: The
PostToolUsehook captures observations automatically, creating an external memory store that persists outside the context window across sessions.
Further reading
Progressive disclosure
The philosophy behind Claude Mem’s 3-layer context priming strategy and why it works.
Architecture overview
How Claude Mem’s hooks, worker, and MCP tools fit together.