Deployment Overview

When to Deploy a Server

CEMS can run in two modes:

Client Mode (Individual) - Users connect to an existing CEMS server with an API key
Server Mode (Team/Company) - Self-hosted server for multi-user, multi-team deployment

Deploy a CEMS server when:

Your team needs shared memory across members
You want centralized user management with API keys
You need team-scoped memories separate from personal memories
You require audit logs and compliance tracking
You want to self-host for data privacy and control

Architecture

CEMS uses a three-service architecture:

┌─────────────┐
│   Client    │  (IDE hooks, MCP clients)
│  (Any IDE)  │
└──────┬──────┘
       │ HTTPS + Bearer token
       ↓
┌─────────────┐
│  cems-mcp   │  Port 8766 (Express.js)
│   Wrapper   │  MCP-over-HTTP server
└──────┬──────┘
       │ HTTP
       ↓
┌─────────────┐
│ cems-server │  Port 8765 (Python/Starlette)
│  REST API   │  Memory operations + Admin API
└──────┬──────┘
       │ PostgreSQL
       ↓
┌─────────────┐
│  postgres   │  Port 5432 (pgvector/pg16)
│  + pgvector │  Vectors, metadata, users, teams
└─────────────┘

Services

PostgreSQL + pgvector

Image: pgvector/pgvector:pg16
Port: 5432
Stores vectors (1536-dim), metadata, users, teams
HNSW index for fast vector search
Full-text search (BM25) with tsvector

cems-server

Built from Dockerfile
Port: 8765
Python REST API (Starlette + uvicorn)
Handles memory CRUD, search, maintenance
Admin API for user/team management

cems-mcp

Built from mcp-wrapper/Dockerfile
Port: 8766
Express.js MCP wrapper
Exposes 6 MCP tools (memory_add, memory_search, etc.)
Streamable HTTP transport

Storage

All data lives in PostgreSQL with pgvector:

Table	Purpose
`users`	User accounts with bcrypt-hashed API keys
`teams`	Team/company groupings
`team_members`	User-team memberships with roles
`memory_documents`	Memory documents with metadata
`memory_chunks`	Chunked content with 1536-dim embeddings
`memory_relations`	Memory relationships
`audit_log`	Compliance and activity tracking

Embeddings

By default, CEMS uses:

Model: openai/text-embedding-3-small via OpenRouter
Dimensions: 1536
Backend: OpenRouter API (CEMS_EMBEDDING_BACKEND=openrouter)

Alternative: llama.cpp server for local embeddings (768-dim)

Search Pipeline

CEMS implements a multi-stage retrieval system:

Query Understanding - LLM routes to vector or hybrid strategy
Query Synthesis - LLM expands query into 2-5 search terms
HyDE - Generates hypothetical ideal answer for better matching
Candidate Retrieval - pgvector HNSW (vector) + tsvector (BM25 full-text)
RRF Fusion - Reciprocal Rank Fusion combines result lists
Relevance Filtering - Removes results below threshold
Scoring Adjustments - Time decay, priority boost, project-scoped boost
Token-Budgeted Assembly - Greedy selection within token budget (default: 2000)

Search modes: vector (fast), hybrid (thorough), auto (smart routing)

Maintenance

Scheduled jobs via APScheduler:

Job	Schedule	Purpose
Consolidation	Nightly 3 AM	Merge semantic duplicates (cosine >= 0.92)
Observation Reflection	Nightly 3:30 AM	Condense observations per project
Summarization	Weekly Sun 4 AM	Compress old memories, prune stale
Re-indexing	Monthly 1st 5 AM	Rebuild embeddings, archive dead memories

Next Steps

Docker Compose Setup

Launch services with docker compose

Configuration

Environment variables and settings

User Management

Create users and manage API keys

Get Started

Core Concepts

IDE Integration

Using CEMS

Server Deployment

Advanced

Deployment Overview

When to Deploy a Server

Architecture

Services

PostgreSQL + pgvector

cems-server

cems-mcp

Storage

Embeddings

Search Pipeline

Maintenance

Next Steps

Docker Compose Setup

Configuration

User Management

Build docs developers (and LLMs) love

Get Started

Core Concepts

IDE Integration

Using CEMS

Server Deployment

Advanced

​When to Deploy a Server

​Architecture

​Services

PostgreSQL + pgvector

cems-server

cems-mcp

​Storage

​Embeddings

​Search Pipeline

​Maintenance

​Next Steps

Docker Compose Setup

Configuration

User Management

Build docs developers (and LLMs) love

When to Deploy a Server

Architecture

Services

Storage

Embeddings

Search Pipeline

Maintenance

Next Steps