Skip to main content
Search memories using CEMS’s multi-stage retrieval pipeline with query understanding, HyDE, RRF fusion, and LLM re-ranking.

Endpoint

POST /api/memory/search
Authentication: Required (Bearer token)

Request body

query
string
required
The search query. Maximum 2000 characters (automatically truncated).
limit
integer
default:10
Maximum number of results to return
scope
string
default:"both"
Search scope:
  • personal - Only your personal memories
  • shared - Only team-shared memories
  • both - Search both scopes
max_tokens
integer
default:4000
Token budget for assembled results. Results are selected greedily within this budget.
mode
string
default:"vector"
Retrieval mode:
  • auto - Smart routing based on query analysis
  • vector - Fast vector search (0 LLM calls)
  • hybrid - Full pipeline with HyDE + RRF + re-ranking (3-4 LLM calls)
enable_query_synthesis
boolean
default:false
Enable LLM-powered query expansion into 2-5 search terms
enable_hyde
boolean
default:false
Enable HyDE (Hypothetical Document Embeddings) for better semantic matching
enable_rerank
boolean
default:true
Enable LLM re-ranking of results for improved relevance
enable_graph
boolean
default:true
Enable graph traversal for related memories
project
string
Project identifier for scoped boost. Format: org/repoMemories with matching source_ref get a 1.3x score boost.
raw
boolean
default:false
Debug mode: bypass filtering and return all candidates

Response

success
boolean
Whether the search succeeded
results
object[]
Array of search results
count
integer
Number of results returned
mode
string
Retrieval mode used (vector, hybrid, or unified)
tokens_used
integer
Total tokens in the assembled results
queries_used
string[]
Array of queries used in the search (shows query synthesis if enabled)
total_candidates
integer
Total candidates retrieved before filtering
filtered_count
integer
Number of candidates filtered out
intent
object
Query intent analysis (when available)

Examples

curl -X POST https://your-cems-server.com/api/memory/search \
  -H "Authorization: Bearer $CEMS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are my TypeScript preferences?",
    "limit": 5
  }'
Response:
{
  "success": true,
  "results": [
    {
      "memory_id": "550e8400-e29b-41d4-a716-446655440000",
      "content": "I prefer using TypeScript for all new backend services",
      "category": "preferences",
      "score": 0.89,
      "tags": ["typescript", "backend"],
      "timestamp": "2024-02-28T10:30:00Z"
    }
  ],
  "count": 1,
  "mode": "vector",
  "tokens_used": 15,
  "queries_used": ["What are my TypeScript preferences?"],
  "total_candidates": 10,
  "filtered_count": 9
}

Hybrid search with HyDE

curl -X POST https://your-cems-server.com/api/memory/search \
  -H "Authorization: Bearer $CEMS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do we handle authentication?",
    "mode": "hybrid",
    "enable_hyde": true,
    "enable_query_synthesis": true,
    "project": "acme/api-server"
  }'
This uses the full pipeline:
  1. Query understanding and intent detection
  2. Query synthesis (expands to 2-5 queries)
  3. HyDE (generates hypothetical answer)
  4. Multi-query retrieval
  5. RRF fusion
  6. LLM re-ranking
  7. Project-scoped boost for acme/api-server

Retrieval pipeline stages

The enhanced retrieval pipeline (retrieve_for_inference) implements 9 stages:
1

Query understanding

Analyzes intent, domains, and entities using an LLM
2

Query synthesis

LLM expands query into 2-5 search terms for better coverage
3

HyDE

Generates hypothetical ideal answer for better semantic matching
4

Candidate retrieval

Vector search + graph traversal across multiple queries
5

RRF fusion

Reciprocal Rank Fusion combines multi-query results
6

LLM re-ranking

Smarter relevance scoring using LLM judgment
7

Relevance filtering

Removes results below threshold
8

Unified scoring

Applies time decay, priority boost, and project-scoped scoring
9

Token-budgeted assembly

Greedy selection within token budget
See Search Pipeline for detailed explanations.

Search modes comparison

ModeLLM CallsSpeedBest for
vector0FastestSimple queries, known terms
auto0-4SmartGeneral use (default)
hybrid3-4ThoroughComplex queries, preferences, temporal

Error responses

Status codes:
  • 400 - Bad request (missing query)
  • 401 - Unauthorized (invalid or missing API key)
  • 500 - Internal server error

Implementation reference

See src/cems/api/handlers/memory.py:272-386 for the complete implementation.

Build docs developers (and LLMs) love