Memory Injection
On every user prompt, CEMS automatically searches for relevant memories and injects them as context:- Prompt interception - IDE hooks capture user prompts before they reach the LLM
- Relevance search - The search pipeline finds memories related to the query
- Context injection - Selected memories are added to the prompt with metadata (timestamp, category, confidence score)
- Token budgeting - Assembly algorithm ensures context fits within token limits (default: 2000 tokens)
Hook Integration
CEMS uses IDE-specific hooks to intercept prompts:- Claude Code:
cems_user_prompts_submit.pyhook - Cursor:
cems_agent_response.pyhook - Codex: Commands with memory recall
- Goose: MCP integration via config.yaml
Automatic vs Manual Recall
Automatic injection happens transparently on every prompt. Users can also trigger manual recall using skills:Session Learning
At the end of each session, CEMS extracts and stores key learnings:Extraction Process
- Session end hook triggers (
cems_stop.pyin Claude Code) - Transcript analysis - LLM reviews the full session transcript
- Learning extraction - Identifies:
- User preferences discovered
- Decisions made
- Patterns observed
- Problems solved
- Storage - Learnings stored as memories with
category: learnings
Tool Learning
Thecems_post_tool_use.py hook captures tool-specific learnings:
- Successful tool usage patterns
- Tool parameter preferences
- Error recovery strategies
- Tool combinations that work well
category: patterns for future reference.
Observational Memory
The observer daemon runs continuously in the background to capture high-level workflow patterns:How It Works
- Transcript monitoring - Polls
~/.claude/projects/*/JSONL files every 30 seconds - Accumulation - Waits until 50KB of new content accumulates
- Batch extraction - Sends transcript batch to server
- Observation generation - Server uses Gemini 2.5 Flash to extract high-level insights
- Storage - Observations stored as memories with
category: context
What Gets Observed
The observer identifies workflow patterns like:- Deployment workflows (“User deploys via Coolify”)
- Technology stack (“Project uses PostgreSQL + pgvector”)
- Development patterns (“User prefers TypeScript for new features”)
- Infrastructure choices (“Uses Docker Compose for local development”)
Scheduled Maintenance
CEMS runs automated maintenance jobs to keep memory quality high:| Job | Schedule | Purpose |
|---|---|---|
| Consolidation | Nightly 3 AM | Merge semantic duplicates (cosine similarity ≥ 0.92) |
| Observation Reflection | Nightly 3:30 AM | Condense observations per project |
| Summarization | Weekly Sunday 4 AM | Compress old memories, prune stale entries |
| Re-indexing | Monthly 1st 5 AM | Rebuild embeddings, archive dead memories |
Consolidation
Finds near-duplicate memories using vector similarity:- Compares embeddings with cosine similarity
- Merges duplicates with similarity ≥ 0.92
- Keeps metadata from the more recent/accessed version
- Updates access counts and timestamps
Summarization
Compresses old, low-priority memories:- Targets memories older than 90 days with low access counts
- LLM generates concise summaries preserving key information
- Original content archived (soft-delete with
archived: true) - Reduces storage while maintaining searchability
Manual Maintenance
Users can trigger maintenance manually:Memory Lifecycle
A typical memory flows through these stages:Configuration
Memory behavior can be configured through environment variables:~/.cems/credentials for credential storage and IDE-specific settings in ~/.cems/install.conf.
Integration Points
Code References
- Memory injection:
src/cems/memory/retrieval.py:retrieve_for_inference() - Session learning:
src/cems/api/endpoints.py:/api/session/summarize - Observer daemon:
src/cems/observer.py - Maintenance jobs:
src/cems/maintenance/scheduler.py
Related Concepts
- Memory Types - Categories and organization
- Search Pipeline - How memories are retrieved
- Architecture - System components and storage