Skip to main content
CEMS uses a multi-stage retrieval pipeline to find relevant memories. Understanding how to tune retrieval parameters can significantly improve search quality.

Search Pipeline Overview

CEMS retrieval pipeline has 9 stages:
1

Query Understanding

LLM routes query to vector or hybrid strategy based on complexity
2

Query Synthesis

LLM expands query into 2-5 search terms for better coverage
3

HyDE

Generates hypothetical ideal answer for better vector matching
4

Candidate Retrieval

pgvector HNSW (vector) + tsvector BM25 (full-text)
5

RRF Fusion

Reciprocal Rank Fusion combines result lists
6

Relevance Filtering

Removes results below threshold
7

Scoring Adjustments

Time decay, priority boost, project-scoped boost
8

Token-Budgeted Assembly

Greedy selection within token budget (default: 2000)

Search Modes

CEMS supports three search modes:

Vector

Fast, 0 LLM calls - pure vector similarity

Hybrid

Thorough, 3-4 LLM calls - query synthesis + HyDE + RRF

Auto

Smart routing based on query complexity

Vector Mode

When to use: Simple, precise queries with clear keywords. How it works:
  • Single vector search using query embedding
  • No query synthesis or HyDE
  • Fastest mode (0 LLM calls)
  • Best for exact matches
# CLI
cems search "Python version" --mode vector

# API
curl -X POST http://localhost:8765/api/memory/search \
  -H "Authorization: Bearer $CEMS_API_KEY" \
  -d '{"query": "Python version", "mode": "vector"}'

Hybrid Mode

When to use: Complex, multi-domain, or preference queries. How it works:
  • Query synthesis: expands query into 2-5 search terms
  • HyDE: generates hypothetical answer for better matching
  • Vector + full-text search combined via RRF
  • LLM re-ranking for relevance
  • Best for recall and semantic understanding
# CLI
cems search "recommend video editing resources" --mode hybrid

# API
curl -X POST http://localhost:8765/api/memory/search \
  -H "Authorization: Bearer $CEMS_API_KEY" \
  -d '{"query": "recommend video editing resources", "mode": "hybrid"}'
Query Types That Benefit:
Questions asking for recommendations based on user preferences.
# From retrieval.py:119
preference_signals = [
    "recommend", "suggest", "advice",
    "resources", "tools", "accessories",
    "what should", "would you",
    "based on my", "given my",
]
Examples:
  • “recommend video editing resources”
  • “what tools should I use for Python backend?”
  • “suggest a cocktail based on my preferences”

Auto Mode (Default)

When to use: Let CEMS decide based on query complexity. How it works:
# From retrieval.py:1129
def route_to_strategy(intent: dict[str, Any]) -> str:
    complexity = intent.get("complexity", "moderate")
    requires_reasoning = intent.get("requires_reasoning", False)
    domains = intent.get("domains", [])

    # Simple queries -> vector
    if complexity == "simple" and not requires_reasoning:
        return "vector"

    # Complex or multi-domain -> hybrid
    if complexity == "complex" or requires_reasoning or len(domains) > 2:
        return "hybrid"

    return "hybrid"  # Default to hybrid

Retrieval Parameters

Core Parameters

query
string
required
What to search for
limit
integer
default:"10"
Maximum results to return (1-20)
max_tokens
integer
default:"2000"
Token budget for context assembly
scope
enum
default:"both"
Namespace to search: personal, shared, or both
project
string
Project ID (e.g., org/repo) to boost project-scoped memories

Advanced Parameters

enable_graph
boolean
default:"true"
Include graph traversal for related memories (currently not implemented)
enable_query_synthesis
boolean
default:"true"
Use LLM to expand query for better retrieval. Set to false for vector-only mode.
raw
boolean
default:"false"
Debug mode: bypass relevance filtering to see all candidate results

Configuration via Environment

# Vector search weight in hybrid mode
CEMS_HYBRID_VECTOR_WEIGHT=0.4  # 0-1, default 0.4
# Higher = favor vector similarity
# Lower = favor BM25 full-text

# Reranking input limit
CEMS_RERANK_INPUT_LIMIT=40  # Default 40
# How many candidates to send to LLM reranker

# Project scoring
CEMS_ENABLE_PROJECT_PENALTY=true
CEMS_PROJECT_BOOST_FACTOR=1.3    # Same-project boost
CEMS_PROJECT_PENALTY_FACTOR=0.8  # Different-project penalty

Scoring Adjustments

CEMS applies multiple scoring factors to rank results:

1. Priority Boost

Range: 1.0-2.0x
Source: Memory metadata priority field
# From retrieval.py:602
score *= result.metadata.priority  # 1.0 default, up to 2.0 for hot memories
Set priority when storing:
cems add "Critical: Production DB password is in 1Password" \
  --priority 2.0

2. Time Decay

Half-life: 60 days since last access
Formula: 1.0 / (1.0 + days_since_access / 60)
# From retrieval.py:607
now = datetime.now(UTC)
days_since_access = (now - result.metadata.last_accessed).days
time_decay = 1.0 / (1.0 + (days_since_access / 60))
score *= time_decay
Impact:
  • 0 days: 100% score
  • 30 days: 67% score
  • 60 days: 50% score
  • 120 days: 33% score

3. Pinned Boost

Boost: 1.1x (10%)
Source: Memory metadata pinned field
# From retrieval.py:613
if result.metadata.pinned:
    score *= 1.1
Pin important memories:
cems pin <memory_id>

4. Project-Scoped Scoring

Same project: 1.3x boost
Different project: 0.8x penalty
No project tag: 0.9x penalty
# From retrieval.py:619
if project:
    source_ref = result.metadata.source_ref or ""
    if source_ref.startswith(f"project:{project}"):
        score *= 1.3  # Same project boost
    elif source_ref.startswith("project:"):
        score *= 0.8  # Different project penalty
    else:
        score *= 0.9  # No project tag
Search with project context:
cems search "database migration" --project myorg/myrepo

Query Synthesis

Query synthesis expands your query into multiple search terms for better coverage.

How It Works

# From retrieval.py:166
def synthesize_query(
    query: str,
    client: OpenRouterClient,
    is_preference: bool = False,
    profile_context: list[str] | None = None,
) -> list[str]:
For preference queries:
  • Generates 4-5 search terms
  • Bridges semantic gap between question (“recommend X”) and answer (“I use Y”)
  • Uses profile context (actual user memories) for better expansion
For temporal queries:
  • Generates 3-4 search terms
  • Focuses on date-related terms and chronological ordering
  • Searches for BOTH events in “which first/last” questions
For general queries:
  • Generates 2-3 search terms
  • Stays within same domain (no over-generalization)
  • Prefers specific technical terms over generic words

Example Expansion

Query: "recommend video editing resources"

Synthesized:
1. "I use video editing software"
2. "Adobe Premiere Pro"
3. "Final Cut Pro"
4. "my favorite video editor"
5. "I work with video production"

HyDE (Hypothetical Document Embeddings)

HyDE generates a hypothetical answer, then searches for memories similar to that answer.

Why HyDE Works

Queries and answers have different phrasing:
  • Query: “recommend video editing resources”
  • Answer: “I use Adobe Premiere Pro for video editing”
HyDE bridges this gap by generating what an ideal answer would look like.

Implementation

# From retrieval.py:803
def generate_hypothetical_memory(
    query: str,
    client: OpenRouterClient,
    is_preference: bool = False,
    profile_context: list[str] | None = None,
) -> str:
For preference queries:
Prompt: Generate a hypothetical memory written FROM THE USER'S PERSPECTIVE.
- Use first-person: "I use...", "I prefer..."
- Mention SPECIFIC products, brands, tools by NAME
- Include context about WHY they like it
For temporal queries:
Prompt: Generate a hypothetical memory with specific dates/time references.
- Include dates: "On March 15th..."
- Mention sequence: "First X happened, then Y"
- Include duration: "3 days before..."

RRF Fusion (Reciprocal Rank Fusion)

Combines results from multiple retrievers (vector + full-text + query expansions).

Formula

# From retrieval.py:933
rrf_score = sum(weight_i / (k + rank_i)) for each retriever i

# k = 60 (standard in literature)
# weight_i = list weight (original query gets 2x vs expansions)

QMD Enhancements

List Weights:
  • Original query: 2.0x weight
  • Synthesized queries: 1.0x weight
  • Protects precision (original query results matter more)
Top-Rank Bonus:
  • Rank 1: +0.05 bonus (per list)
  • Ranks 2-3: +0.02 bonus (per list)
  • Protects top hits from getting buried
# From retrieval.py:936
bonus = bonus_r1 if rank == 1 else (bonus_r23 if rank <= 3 else 0.0)
rrf_scores[result.memory_id] += base + bonus

Score Blending

# From retrieval.py:966
# Normalize RRF to 0-1, then blend with vector score
norm_rrf = (rrf_score - min_rrf) / rrf_range
result.score = 0.5 * norm_rrf + 0.5 * result.score

Token-Budgeted Assembly

Selects results within token budget to fit in LLM context.

Standard Assembly

# From retrieval.py:446
def assemble_context(
    results: list[SearchResult],
    max_tokens: int = 2000,
) -> tuple[list[SearchResult], int]:
Greedy selection:
  • Sort by score (descending)
  • Add results until token budget exhausted
  • Don’t break early - try smaller subsequent results

Diverse Assembly (for aggregation queries)

# From retrieval.py:286
def assemble_context_diverse(
    results: list[SearchResult],
    max_tokens: int = 2000,
    mmr_lambda: float = 0.6,
) -> tuple[list[SearchResult], int]:
MMR (Maximal Marginal Relevance):
MMR = λ * relevance - (1-λ) * max_similarity_to_selected
Why MMR:
  • Aggregation queries need results from MULTIPLE sessions
  • Greedy selection can pick many similar results from same session
  • MMR balances relevance (60%) and diversity (40%)
Two-phase selection:
  1. Phase 1: Take top result from each session using MMR
  2. Phase 2: Fill remaining budget using MMR across all results

Tuning Examples

# Use hybrid mode with higher token budget
cems search "recommend video editing resources" \
  --mode hybrid \
  --max-tokens 4000 \
  --limit 20
# Use vector mode, disable query synthesis
cems search "Python version" \
  --mode vector \
  --limit 5
# Lower time decay half-life (favor recency)
# Edit config.py or set env var:
# Time decay uses 60-day half-life by default
# Can't be configured via env - hardcoded in retrieval.py:608
Alternatively, pin important memories:
cems pin <memory_id>
# Use raw mode to see all candidates before filtering
cems search "troublesome query" --raw

# Check vector vs full-text weights
export CEMS_HYBRID_VECTOR_WEIGHT=0.7  # Favor vector over BM25
cems search "troublesome query"

Next Steps

MCP Integration

Use CEMS with MCP-compatible agents

Troubleshooting

Debug search and retrieval issues

Build docs developers (and LLMs) love