System Overview
Services
PostgreSQL + pgvector (Port 5432)
Image:pgvector/pgvector:pg16
Purpose: Unified storage for vectors, metadata, and authentication.
Extensions:
pgvector- Vector similarity search (HNSW index)pg_trgm- Trigram-based fuzzy matching
CEMS Server (Port 8765)
Image: Built fromDockerfile
Purpose: Python REST API for memory operations.
Stack:
- Framework: Starlette (ASGI)
- Server: uvicorn
- Language: Python 3.12+
- Memory operations (add, search, forget, update)
- Session summarization
- Tool learning extraction
- Admin user management
- Scheduled maintenance (APScheduler)
CEMS MCP (Port 8766)
Image: Built frommcp-wrapper/
Purpose: Model Context Protocol wrapper for MCP-compatible clients.
Stack:
- Framework: Express.js
- Runtime: Node.js
- Protocol: MCP over Streamable HTTP
memory_add- Store a memorymemory_search- Search with full pipelinememory_get- Retrieve by IDmemory_forget- Delete/archivememory_update- Update contentmemory_maintenance- Trigger maintenance
Storage Schema
memory_documents Table
Stores complete document content and metadata.memory_chunks Table
Stores chunked content with embeddings for search.m = 16- Number of bi-directional links per node (higher = better recall, more memory)ef_construction = 64- Size of dynamic candidate list during construction (higher = better quality, slower build)
users Table
Stores user accounts and API key authentication.cems_usr_<32-char-random> (SHA256 hashed with bcrypt)
teams Table
Stores team information for shared memories.category_summaries Table
Caches category summaries for fast profile generation.Embeddings
Provider
Default: OpenRouter API Model:text-embedding-3-small (OpenAI)
Dimensions: 1536
Cost: ~$0.02 per 1M tokens
Alternative: llama.cpp Server
Backend:CEMS_EMBEDDING_BACKEND=llamacpp_server
Model: nomic-embed-text-v1.5 (768 dimensions)
Endpoint: http://localhost:8080/v1/embeddings
Configuration:
Batch Support
Fromsrc/cems/embedding.py:embed_batch():
Authentication
API Key Flow
Admin Operations
Admin key: Set viaCEMS_ADMIN_KEY environment variable.
Protected endpoints:
POST /admin/users- Create userGET /admin/users- List usersPATCH /admin/users/{id}- Update userDELETE /admin/users/{id}- Delete userPOST /admin/users/{id}/reset-key- Reset API keyGET /admin/teams- List teamsPOST /admin/teams- Create team
API Endpoints
Public API (User authentication)
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /api/memory/add | Add a memory |
| POST | /api/memory/search | Search memories |
| GET | /api/memory/get | Get full document by ID |
| GET | /api/memory/list | List all memories (paginated) |
| POST | /api/memory/forget | Delete or archive memory |
| POST | /api/memory/update | Update memory content |
| POST | /api/memory/log-shown | Log which memories were shown (feedback) |
| POST | /api/memory/maintenance | Trigger maintenance job |
| GET | /api/memory/status | System status (memory count, categories) |
| GET | /api/memory/profile | Session profile context |
| GET | /api/memory/foundation | Foundation guidelines |
| GET | /api/memory/gate-rules | Gate rules by project |
| POST | /api/session/summarize | Session summary (observer) |
| POST | /api/tool/learning | Tool learning extraction |
| POST | /api/index/repo | Index git repository |
Admin API (Admin key required)
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /admin/users | List all users |
| POST | /admin/users | Create new user |
| GET | /admin/users/{id} | Get user details |
| PATCH | /admin/users/{id} | Update user |
| DELETE | /admin/users/{id} | Delete user |
| POST | /admin/users/{id}/reset-key | Reset API key |
| GET | /admin/teams | List all teams |
| POST | /admin/teams | Create new team |
Scheduled Maintenance
Fromsrc/cems/maintenance/scheduler.py:
APScheduler Configuration
Job Details
Consolidation:- Finds pairs with cosine similarity ≥ 0.92
- Merges content (keeps newer/more-accessed version)
- Updates metadata (sums access counts, keeps max priority)
- Soft-deletes duplicate
- Groups observations by
source_ref(project) - LLM condenses observations into high-level insights
- Stores as new memory with
category: context - Archives original observations
- Targets memories:
age > 90 days AND access_count < 3 - LLM generates 2-3 sentence summary
- Replaces original content with summary
- Sets
archived: trueon original
- Rebuilds HNSW index for vector search
- Regenerates tsvector for full-text search
- Permanently deletes memories with
archived: true AND age > 180 days
Client Components
CEMS CLI
Installation:uv tool install cems
Commands:
Observer Daemon
Process:cems-observer (background)
Configuration:
- Polls
~/.claude/projects/*/every 30 seconds - Tracks read position per project in
~/.cems/observer_state.json - Accumulates until 50KB threshold
- Sends to
POST /api/session/summarize
IDE Hooks
Claude Code (~/.claude/hooks/):
cems_session_start.py- Profile injectioncems_user_prompts_submit.py- Memory searchcems_post_tool_use.py- Tool learningcems_pre_tool_use.py- Gate rulescems_stop.py- Session analysiscems_pre_compact.py- Pre-compaction hook
~/.cursor/hooks/):
cems_session_start.pycems_agent_response.pycems_stop.py
~/.codex/commands/):
recall.md,remember.md,foundation.md
~/.config/goose/config.yaml):
- MCP extension configuration
Deployment
Quick Start
Environment Variables
Required:Production Considerations
Security:- Change
POSTGRES_PASSWORDfrom default - Use strong
CEMS_ADMIN_KEY(32+ chars) - Enable SSL for PostgreSQL in production
- Use reverse proxy (nginx/Caddy) for HTTPS
- Scale PostgreSQL vertically (memory for HNSW index)
- Consider read replicas for high read loads
- Monitor embedding API rate limits (OpenRouter: 50 req/min)
Related Concepts
- How It Works - System lifecycle and integration
- Memory Types - Schema details and categories
- Search Pipeline - Retrieval implementation