When to Deploy a Server
CEMS can run in two modes:- Client Mode (Individual) - Users connect to an existing CEMS server with an API key
- Server Mode (Team/Company) - Self-hosted server for multi-user, multi-team deployment
- Your team needs shared memory across members
- You want centralized user management with API keys
- You need team-scoped memories separate from personal memories
- You require audit logs and compliance tracking
- You want to self-host for data privacy and control
Architecture
CEMS uses a three-service architecture:Services
PostgreSQL + pgvector
- Image:
pgvector/pgvector:pg16 - Port:
5432 - Stores vectors (1536-dim), metadata, users, teams
- HNSW index for fast vector search
- Full-text search (BM25) with tsvector
cems-server
- Built from
Dockerfile - Port:
8765 - Python REST API (Starlette + uvicorn)
- Handles memory CRUD, search, maintenance
- Admin API for user/team management
cems-mcp
- Built from
mcp-wrapper/Dockerfile - Port:
8766 - Express.js MCP wrapper
- Exposes 6 MCP tools (memory_add, memory_search, etc.)
- Streamable HTTP transport
Storage
All data lives in PostgreSQL with pgvector:| Table | Purpose |
|---|---|
users | User accounts with bcrypt-hashed API keys |
teams | Team/company groupings |
team_members | User-team memberships with roles |
memory_documents | Memory documents with metadata |
memory_chunks | Chunked content with 1536-dim embeddings |
memory_relations | Memory relationships |
audit_log | Compliance and activity tracking |
Embeddings
By default, CEMS uses:- Model:
openai/text-embedding-3-smallvia OpenRouter - Dimensions: 1536
- Backend: OpenRouter API (
CEMS_EMBEDDING_BACKEND=openrouter)
Search Pipeline
CEMS implements a multi-stage retrieval system:- Query Understanding - LLM routes to vector or hybrid strategy
- Query Synthesis - LLM expands query into 2-5 search terms
- HyDE - Generates hypothetical ideal answer for better matching
- Candidate Retrieval - pgvector HNSW (vector) + tsvector (BM25 full-text)
- RRF Fusion - Reciprocal Rank Fusion combines result lists
- Relevance Filtering - Removes results below threshold
- Scoring Adjustments - Time decay, priority boost, project-scoped boost
- Token-Budgeted Assembly - Greedy selection within token budget (default: 2000)
vector (fast), hybrid (thorough), auto (smart routing)
Maintenance
Scheduled jobs via APScheduler:| Job | Schedule | Purpose |
|---|---|---|
| Consolidation | Nightly 3 AM | Merge semantic duplicates (cosine >= 0.92) |
| Observation Reflection | Nightly 3:30 AM | Condense observations per project |
| Summarization | Weekly Sun 4 AM | Compress old memories, prune stale |
| Re-indexing | Monthly 1st 5 AM | Rebuild embeddings, archive dead memories |
Next Steps
Docker Compose Setup
Launch services with docker compose
Configuration
Environment variables and settings
User Management
Create users and manage API keys