Skip to main content
CEMS provides persistent memory for AI coding assistants through four key mechanisms that work together to maintain context across sessions.

Memory Injection

On every user prompt, CEMS automatically searches for relevant memories and injects them as context:
  1. Prompt interception - IDE hooks capture user prompts before they reach the LLM
  2. Relevance search - The search pipeline finds memories related to the query
  3. Context injection - Selected memories are added to the prompt with metadata (timestamp, category, confidence score)
  4. Token budgeting - Assembly algorithm ensures context fits within token limits (default: 2000 tokens)

Hook Integration

CEMS uses IDE-specific hooks to intercept prompts:
  • Claude Code: cems_user_prompts_submit.py hook
  • Cursor: cems_agent_response.py hook
  • Codex: Commands with memory recall
  • Goose: MCP integration via config.yaml

Automatic vs Manual Recall

Automatic injection happens transparently on every prompt. Users can also trigger manual recall using skills:
/recall What are my coding preferences?
/remember I prefer Python for backend development

Session Learning

At the end of each session, CEMS extracts and stores key learnings:

Extraction Process

  1. Session end hook triggers (cems_stop.py in Claude Code)
  2. Transcript analysis - LLM reviews the full session transcript
  3. Learning extraction - Identifies:
    • User preferences discovered
    • Decisions made
    • Patterns observed
    • Problems solved
  4. Storage - Learnings stored as memories with category: learnings

Tool Learning

The cems_post_tool_use.py hook captures tool-specific learnings:
  • Successful tool usage patterns
  • Tool parameter preferences
  • Error recovery strategies
  • Tool combinations that work well
These are stored as memories with category: patterns for future reference.

Observational Memory

The observer daemon runs continuously in the background to capture high-level workflow patterns:

How It Works

cems-observer  # Background process
  1. Transcript monitoring - Polls ~/.claude/projects/*/ JSONL files every 30 seconds
  2. Accumulation - Waits until 50KB of new content accumulates
  3. Batch extraction - Sends transcript batch to server
  4. Observation generation - Server uses Gemini 2.5 Flash to extract high-level insights
  5. Storage - Observations stored as memories with category: context

What Gets Observed

The observer identifies workflow patterns like:
  • Deployment workflows (“User deploys via Coolify”)
  • Technology stack (“Project uses PostgreSQL + pgvector”)
  • Development patterns (“User prefers TypeScript for new features”)
  • Infrastructure choices (“Uses Docker Compose for local development”)
Unlike session learning (which captures specific decisions), observational memory captures patterns across multiple sessions.

Scheduled Maintenance

CEMS runs automated maintenance jobs to keep memory quality high:
JobSchedulePurpose
ConsolidationNightly 3 AMMerge semantic duplicates (cosine similarity ≥ 0.92)
Observation ReflectionNightly 3:30 AMCondense observations per project
SummarizationWeekly Sunday 4 AMCompress old memories, prune stale entries
Re-indexingMonthly 1st 5 AMRebuild embeddings, archive dead memories

Consolidation

Finds near-duplicate memories using vector similarity:
  • Compares embeddings with cosine similarity
  • Merges duplicates with similarity ≥ 0.92
  • Keeps metadata from the more recent/accessed version
  • Updates access counts and timestamps

Summarization

Compresses old, low-priority memories:
  • Targets memories older than 90 days with low access counts
  • LLM generates concise summaries preserving key information
  • Original content archived (soft-delete with archived: true)
  • Reduces storage while maintaining searchability

Manual Maintenance

Users can trigger maintenance manually:
cems maintenance --job consolidation
cems maintenance --job summarization
cems maintenance --job reindexing

Memory Lifecycle

A typical memory flows through these stages:

Configuration

Memory behavior can be configured through environment variables:
# Auto-update control
CEMS_AUTO_UPDATE=0  # Disable auto-updates

# Observer settings
CEMS_OBSERVER_POLL_INTERVAL=30  # Seconds between transcript checks
CEMS_OBSERVER_BATCH_SIZE=50000  # Bytes before extraction

# Maintenance schedules
CEMS_CONSOLIDATION_SCHEDULE="0 3 * * *"  # Cron format
See ~/.cems/credentials for credential storage and IDE-specific settings in ~/.cems/install.conf.

Integration Points

Code References

  • Memory injection: src/cems/memory/retrieval.py:retrieve_for_inference()
  • Session learning: src/cems/api/endpoints.py:/api/session/summarize
  • Observer daemon: src/cems/observer.py
  • Maintenance jobs: src/cems/maintenance/scheduler.py

Build docs developers (and LLMs) love