Core principle
Show what exists and its retrieval cost first. Let the agent decide what to fetch based on relevance and need.Progressive disclosure is an information architecture pattern where complexity is revealed gradually rather than all at once. In the context of AI agents, it means building a retrieval hierarchy: lightweight metadata first, full content only on demand. This mirrors how humans work. We scan headlines before reading articles, review a table of contents before diving into chapters, and check file names before opening files.
The problem: context pollution
Traditional RAG (Retrieval-Augmented Generation) systems fetch everything upfront. The result is a context window full of content that may or may not be relevant to the current task.- 94% of the attention budget is spent on irrelevant content
- The user’s actual prompt gets buried under a mountain of history
- The agent must process everything before understanding what the task requires
- There’s no way to know what’s useful until after reading it — defeating the purpose of fetching it early
Claude Mem’s solution
- The agent controls its own context consumption
- Every token fetched is directly relevant to the current task
- The agent can fetch more if needed, or skip everything if nothing is relevant
- The retrieval cost is visible before the fetch decision is made
The 3-layer workflow
Claude Mem implements progressive disclosure through a structured 3-layer pattern. Each layer provides progressively more detail at progressively higher token cost.Layer 1 — Search (index)
Start by searching to get a compact index of matching observations with their IDs.Returns:Cost: ~50–100 tokens per result.
Value: The agent can scan and decide which observations are worth fetching — without committing to any of them.
Layer 2 — Timeline (context)
Get chronological context around an observation of interest to understand the narrative arc.Returns: A chronological view of what happened before, during, and after observation #2543.Cost: Variable based on depth.
Value: Reveals the sequence of events that led to and followed the observation — useful for understanding decisions in context.
The index format
Every session start provides a compact index. Here is what a real index looks like:- What exists: The title compresses the full observation into ~10 words
- When it happened: Timestamps provide temporal context
- What type: Icons indicate observation category and signal importance
- Retrieval cost: Token counts enable informed budget decisions
The observation type legend
| Icon | Type | Meaning |
|---|---|---|
| 🎯 | session-request | User’s original goal for the session |
| 🔴 | gotcha | Critical edge case or pitfall — often worth fetching immediately |
| 🟡 | problem-solution | Bug fix or workaround |
| 🔵 | how-it-works | Technical explanation |
| 🟢 | what-changed | Code or architecture change |
| 🟣 | discovery | Learning or insight |
| 🟠 | why-it-exists | Design rationale |
| 🟤 | decision | Architecture decision |
| ⚖️ | trade-off | Deliberate compromise |
Context as currency
Think of the context window as a budget to spend wisely:| Approach | Metaphor | Outcome |
|---|---|---|
| Dump everything | Spending your entire paycheck on groceries you might need someday | Waste, clutter, can’t afford what you actually need |
| Fetch nothing | Refusing to spend any money | Starvation, can’t accomplish tasks |
| Progressive disclosure | Check your pantry, make a shopping list, buy only what you need | Efficiency, room for unexpected needs |
The attention budget in practice
LLMs have finite attention:- Every token attends to every other token (n² relationships)
- A 100,000-token window does not provide 100,000 tokens of useful attention
- Context “rot” happens as the window fills with low-signal content
- Later tokens receive less relative attention than earlier ones
- Start with ~1,000 tokens of index
- Agent has ~99,000 tokens free for the actual task
- Agent fetches ~200 tokens when a specific observation is relevant
- Final budget: ~98,800 tokens available for real work
Real-world example
Without progressive disclosure
With progressive disclosure
The decision tree
When the agent encounters this index entry:Design principles
Make costs visible
Every index entry shows the approximate token count for fetching the full observation:Use semantic compression
The quality of an observation title determines whether the system works or fails. The agent must be able to make a fetch decision from the title alone.Bad title
Good title
Group by context
Observations are grouped by date and by file path. If an agent is working onsrc/hooks/context-hook.ts, the index immediately surfaces related observations:
Design for agent autonomy
Progressive disclosure treats the agent as an intelligent information forager, not a passive recipient of pre-selected context.Cognitive load theory
Progressive disclosure is grounded in how humans (and LLMs) process information.Intrinsic load — unavoidable task complexity
Intrinsic load — unavoidable task complexity
Extraneous load — noise added by poor presentation
Extraneous load — noise added by poor presentation
Traditional RAG adds extraneous load: scanning irrelevant observations, filtering noise, remembering what to ignore, re-contextualizing after each section.Progressive disclosure minimizes extraneous load: scan titles with low effort, fetch only relevant content with targeted effort, maintain full attention on the current task.
Germane load — building useful mental models
Germane load — building useful mental models
Progressive disclosure supports germane load through consistent structure (legend, grouping), clear categorization (types, icons), semantic compression (good titles), and explicit costs (token counts). These patterns reduce the overhead of learning how to use the system and let the agent focus on the content.
Anti-patterns to avoid
Verbose titles
Verbose titles
Bad:
Investigation into the issue where hooks time outGood: 🔴 Hook timeout: 60s too short for npm installTitles must be scannable and self-contained. The agent shouldn’t need to fetch the observation to decide if it’s relevant.Hiding retrieval costs
Hiding retrieval costs
Bad:
| #2543 | 2:14 PM | 🔴 | Hook timeout issue |Good: | #2543 | 2:14 PM | 🔴 | Hook timeout issue | ~155 |Without costs, the agent can’t make informed ROI decisions about what to fetch.No retrieval path
No retrieval path
Bad: Show an index with no instructions on how to fetch full details.Good: Include explicit guidance: Use MCP search tools to fetch full observation details on-demand.The index is useless without a clear retrieval mechanism.
Skipping the index layer
Skipping the index layer
Measuring success
Progressive disclosure is working when these conditions hold:| Signal | Target |
|---|---|
| Waste ratio: Relevant tokens / total context tokens | >80% |
| Selective fetching: Observations fetched vs. index shown | 2–5 out of 50 |
| Time to relevant context | Faster than scanning all context |
| Depth scaling: Depth of fetch matches task complexity | Simple → index only; complex → 5–10 observations + code reads |
Future enhancements
Adaptive index size
Adaptive index size
Vary index size based on session context:
Relevance scoring
Relevance scoring
Pre-sort index entries by semantic similarity to the current task:
Cost forecasting
Cost forecasting
Surface budget estimates in the index itself:
Progressive detail levels
Progressive detail levels
Extend the 3 layers to 4 by adding an intermediate summary layer:
Key takeaways
- Show, don’t tell: The index reveals what exists without forcing consumption.
- Cost-conscious: Make retrieval costs visible for informed decisions.
- Agent autonomy: Let the agent decide what’s relevant — it knows the task, the system doesn’t.
- Semantic compression: Good titles make or break the system.
- Consistent structure: Predictable patterns reduce cognitive overhead.
- Two-tier everything: Index first, details on-demand.
- Context as currency: Spend wisely on high-value information.
“The best interface is one that disappears when not needed, and appears exactly when it is.”Progressive disclosure respects the agent’s intelligence and autonomy. We provide the map; the agent chooses the path.
This philosophy emerged from real-world usage of Claude Mem across hundreds of coding sessions. The pattern works because it aligns with both human cognition and LLM attention mechanics.
Further reading
Context engineering
Foundational principles for curating optimal token sets for AI agents.
Architecture overview
How Claude Mem’s hooks, worker, and MCP tools implement this philosophy.