Architecture evolution
The problem
Goal: Make Claude smarter across sessions without the user noticing the memory system exists. Challenge: How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time — all without slowing down or interfering with the main workflow? This is the story of how Claude Mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.v1–v2: The naive approach
Dump everything
| Symptom | Root cause |
|---|---|
| Context window polluted | Raw tool outputs are verbose — 35,000 tokens for a typical session |
| Nothing relevant | Only ~500 of those tokens related to the current task (1.4%) |
| No search | Linear scan required |
| Concept proved | Memory across sessions is genuinely valuable |
v3: Smart compression, wrong architecture
The breakthrough: AI-powered compression
The core insight: use Claude itself to compress observations.- Compression ratio: 10:1 to 100:1
- Semantic understanding (not just keyword matching)
- Background processing (hooks stayed fast)
- Search became useful
| Problem | Impact |
|---|---|
| Still loaded everything upfront | Context still bloated |
| Session ID management broken | SDK session IDs change every turn — observations got orphaned |
| Aggressive cleanup | SessionEnd → DELETE /worker/session interrupted summaries mid-process |
| Multiple SDK sessions per conversation | 100+ short SDK sessions instead of one long one |
The key realizations
Realization 1: Progressive disclosure
Problem: Even compressed observations pollute context if you load them all. Insight: Humans don’t read an entire codebase before starting work. Why should AI? Solution: Show an index first, fetch details on-demand.Realization 2: Session ID chaos
Problem: SDK session IDs change on every turn.system.init message and update the database on each turn. Use INSERT OR IGNORE with the Claude Code session ID (from hook stdin) as the unique key — it never changes within a conversation.
Realization 3: Graceful vs aggressive cleanup
v3 approach (broken):Realization 4: One session, not many
Problem: Creating a new SDK session per observation meant 100+ short sessions per conversation. Solution: One long-running SDK session with streaming input:v4: The architecture that works
Core design
The five-hook architecture
- SessionStart
- UserPromptSubmit
- PostToolUse
- Stop
- SessionEnd
Purpose: Inject context from previous sessionsTiming: When Claude Code startsWhat it does:
- Queries last 10 session summaries
- Formats as progressive disclosure index with token counts
- Injects into context via
hookSpecificOutput.additionalContext
- Index format (not full details)
- Token counts visible in the index
- MCP search instructions included
Database schema evolution
v3 schema (flat):Critical bug fixes in v4
Context injection pollution (v4.3.1)
npm install output was mixing with hook stdout, corrupting the JSON that Claude Code expected.
--loglevel=silent to the npm install command so only the hook JSON reaches stdout.
Double shebang issue (v4.3.1)
Source files had#!/usr/bin/env node — and esbuild added another one during the build step. The resulting executables had two shebangs and failed to parse.
Fix: remove shebangs from source files and let esbuild add them during the build.
FTS5 injection vulnerability (v4.2.3)
User input was passed directly to the FTS5 query string:NOT NULL constraint violation (v4.2.8)
Session creation failed when the user prompt was empty (e.g., automated tool invocations). Fixed by changinguser_prompt TEXT NOT NULL to user_prompt TEXT (nullable).
v5: Maturity and user experience
v5.0.0: Hybrid search (October 2025)
Added optional Chroma vector database alongside SQLite FTS5:- FTS5: fast keyword matching, no extra dependencies
- Chroma: semantic understanding, finds related concepts
- Graceful degradation: works without Chroma (FTS5 only)
v5.0.2: Worker health checks (October 2025)
More robust worker startup and monitoring:v5.0.3: Smart install caching (October 2025)
Version-based caching eliminated the 2–5 secondnpm install on every startup:
v5.1.0: Web-based viewer UI (October 2025)
Real-time visualization of the memory stream:- React web UI at
http://localhost:37777 - Server-Sent Events (SSE) for live updates
- Infinite scroll pagination
- Project filtering and settings persistence
v5.1.2: Theme toggle (November 2025)
Added light/dark/system theme preference to the viewer UI withlocalStorage persistence.
MCP architecture simplification (December 2025)
Before: 9+ overlapping MCP tools
After: 4 tools, 3-layer workflow
- MCP server: 2,718 lines → 312 lines (88% reduction)
- Removed: 19 skill files (~2,744 lines)
- Net: ~5,150 lines of code removed
Skill-based search → MCP-only (v5.4.0+)
Before v5.4.0, Claude Mem used skill files (17 Markdown files) and HTTP API calls viacurl to implement search. This was replaced with native MCP tools:
- Works with both Claude Desktop and Claude Code
- No
curldependency - Simpler to maintain
- All 19
mem-searchskill files removed (~2,744 lines)
Performance comparison across versions
v3 baseline
| Metric | Value |
|---|---|
| Context usage per session | ~25,000 tokens |
| Relevant context | ~2,000 tokens (8%) |
| Hook execution time | ~200ms |
| Search latency | ~500ms (LIKE queries) |
v4 improvements
| Metric | Value | vs v3 |
|---|---|---|
| Context usage per session | ~1,100 tokens | −96% |
| Relevant context | ~1,100 tokens (100%) | +12× relevance |
| Hook execution time | ~45ms | 4× faster |
| Search latency | ~15ms (FTS5) | 33× faster |
v5 improvements
| Metric | Value | vs v4 |
|---|---|---|
| Context usage per session | ~1,100 tokens | Same |
| Hook execution time | ~10ms (cached) | 4× faster |
| Search latency | ~12ms FTS5 / ~25ms hybrid | Slightly faster |
| Viewer UI load time | ~50ms (bundled HTML) | New |
| SSE update latency | ~5ms | New |
PM2 → Bun migration (v7.1.0, December 2025)
Version 7.1.0 replaced PM2 (external process manager) with a custom Bun-basedProcessManager:
| Aspect | PM2 (old) | Bun ProcessManager (new) |
|---|---|---|
| External dependency | Yes (pm2 npm package) | No |
| Native compilation | Via better-sqlite3 | No (bun:sqlite built in) |
| Windows issues | PATH and ENOENT errors | Handled by Bun |
| PID file location | ~/.pm2/pids/ | ~/.claude-mem/.worker.pid |
| Log location | ~/.pm2/logs/ | ~/.claude-mem/logs/ |
| Migration | Automatic on first hook trigger | — |
Lessons learned
Context is finite — respect the budget
Context is finite — respect the budget
Every token in the context window costs attention and money. Progressive disclosure reduces waste by 87% and gives the agent control over what it loads.
AI is the compressor
AI is the compressor
Manual extraction rules can’t match semantic AI compression. Compression ratios of 10:1 to 100:1 are achievable, with semantic understanding rather than keyword extraction.
Session state is complicated
Session state is complicated
The SDK handles conversation state better than manual reconstruction. Track session IDs from
system.init messages; use INSERT OR IGNORE for idempotency.Graceful beats aggressive
Graceful beats aggressive
Let processes finish their work before terminating. Aggressive
DELETE calls interrupt summaries and lose pending observations. A simple completed_at timestamp lets workers exit cleanly.Progressive everything
Progressive everything
Show metadata first, fetch details on-demand. This applies to context injection, search results, and MCP tool design.
Visibility matters
Visibility matters
v5 added a real-time viewer UI. Users don’t need to see the memory system working, but being able to inspect it builds trust and aids debugging.
Migration guide: v3 → v5
Update the plugin
- Updates dependencies (including Chroma for v5.0.0+)
- Runs database schema migrations
- Restarts the worker service with new code
- Activates smart install caching (v5.0.3+)
Further reading
- Hooks Architecture — How hooks power the system
- Hook Scripts Reference — Complete technical reference for all 7 hooks
- PM2 to Bun Migration — v7.1.0 process management migration