Endpoint
Request body
The search query. Maximum 2000 characters (automatically truncated).
Maximum number of results to return
Search scope:
personal- Only your personal memoriesshared- Only team-shared memoriesboth- Search both scopes
Token budget for assembled results. Results are selected greedily within this budget.
Retrieval mode:
auto- Smart routing based on query analysisvector- Fast vector search (0 LLM calls)hybrid- Full pipeline with HyDE + RRF + re-ranking (3-4 LLM calls)
Enable LLM-powered query expansion into 2-5 search terms
Enable HyDE (Hypothetical Document Embeddings) for better semantic matching
Enable LLM re-ranking of results for improved relevance
Enable graph traversal for related memories
Project identifier for scoped boost. Format:
org/repoMemories with matching source_ref get a 1.3x score boost.Debug mode: bypass filtering and return all candidates
Response
Whether the search succeeded
Array of search results
Number of results returned
Retrieval mode used (vector, hybrid, or unified)
Total tokens in the assembled results
Array of queries used in the search (shows query synthesis if enabled)
Total candidates retrieved before filtering
Number of candidates filtered out
Query intent analysis (when available)
Examples
Basic search
Hybrid search with HyDE
- Query understanding and intent detection
- Query synthesis (expands to 2-5 queries)
- HyDE (generates hypothetical answer)
- Multi-query retrieval
- RRF fusion
- LLM re-ranking
- Project-scoped boost for
acme/api-server
Retrieval pipeline stages
The enhanced retrieval pipeline (retrieve_for_inference) implements 9 stages:
See Search Pipeline for detailed explanations.
Search modes comparison
| Mode | LLM Calls | Speed | Best for |
|---|---|---|---|
vector | 0 | Fastest | Simple queries, known terms |
auto | 0-4 | Smart | General use (default) |
hybrid | 3-4 | Thorough | Complex queries, preferences, temporal |
Error responses
Status codes:400- Bad request (missing query)401- Unauthorized (invalid or missing API key)500- Internal server error
Implementation reference
Seesrc/cems/api/handlers/memory.py:272-386 for the complete implementation.