Skip to main content

Search architecture

Claude-Mem uses an MCP-based search architecture that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer progressive disclosure workflow.

Overview

MCP Tools → MCP Protocol → HTTP API → Worker Service → SQLite FTS5 + ChromaDB

MCP tools

4 tools: search, timeline, get_observations, __IMPORTANT

MCP server

Thin wrapper (~312 lines) that translates MCP protocol to HTTP API calls

HTTP API

Fast search operations on Worker Service at port 37777

Hybrid search

SQLite FTS5 for keyword search, ChromaDB for semantic/vector search
Token efficiency: ~10x savings through the 3-layer workflow pattern.

How it works

1

User query

Claude has 4 MCP tools available. When searching memory, it follows the 3-layer workflow:
Step 1: search(query="authentication bug", type="bugfix", limit=10)
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
Step 3: get_observations(ids=[123, 456, 789])
2

MCP protocol

The MCP server receives a tool call via JSON-RPC over stdio:
{
  "method": "tools/call",
  "params": {
    "name": "search",
    "arguments": {
      "query": "authentication bug",
      "type": "bugfix",
      "limit": 10
    }
  }
}
3

HTTP API call

The MCP server translates the call to an HTTP request:
const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
const response = await fetch(url);
4

Worker processing

The worker service executes the FTS5 query:
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = 'bugfix'
ORDER BY rank
LIMIT 10
5

Results returned

The worker returns structured data through the MCP server to Claude:
{
  "content": [{
    "type": "text",
    "text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
  }]
}
6

Claude processes results

Claude reviews the compact index, decides which observations are relevant, and can use timeline for context or get_observations to fetch full details for specific IDs.

The 4 MCP tools

__IMPORTANT — Workflow documentation

Always visible to Claude. Explains the 3-layer workflow pattern and enforces it at the tool level.
3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.

search — Search memory index

Step 1 of the workflow. Returns a compact index for filtering.
{
  name: 'search',
  description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true  // Accepts any parameters
  }
}
HTTP endpoint: GET /api/search
ParameterDescription
queryFull-text search query
limitMaximum results (default: 20)
typeFilter by observation type
projectFilter by project name
dateStart, dateEndDate range filters
offsetPagination offset
orderBySort order
Returns: Compact index with IDs, titles, dates, types (~50–100 tokens per result).

timeline — Get chronological context

Step 2 of the workflow. Reveals the narrative arc around a specific observation.
{
  name: 'timeline',
  description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true
  }
}
HTTP endpoint: GET /api/timeline
ParameterDescription
anchorObservation ID to center timeline around
querySearch query to find anchor automatically
depth_beforeObservations before anchor (default: 3)
depth_afterObservations after anchor (default: 3)
projectFilter by project name
Either anchor or query must be provided. Returns: Chronological view of what happened before, during, and after the anchor point.

get_observations — Fetch full details

Step 3 of the workflow. Fetches complete data only for IDs pre-filtered in steps 1–2.
{
  name: 'get_observations',
  description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
  inputSchema: {
    type: 'object',
    properties: {
      ids: {
        type: 'array',
        items: { type: 'number' },
        description: 'Array of observation IDs to fetch (required)'
      }
    },
    required: ['ids'],
    additionalProperties: true
  }
}
HTTP endpoint: POST /api/observations/batch
{
  "ids": [123, 456, 789],
  "orderBy": "date_desc",
  "project": "my-app"
}
Returns: Complete observation details (~500–1,000 tokens per observation).

MCP server implementation

Location: plugin/scripts/mcp-server.cjs The MCP server is a thin wrapper — it contains no business logic. Its sole job is protocol translation from MCP JSON-RPC to HTTP API calls. Key characteristics:
  • ~312 lines of code (reduced from ~2,718 lines in the previous implementation)
  • Single source of truth: the Worker HTTP API
  • Simple schemas with additionalProperties: true
Handler pattern:
{
  name: 'search',
  handler: async (args: any) => {
    const endpoint = '/api/search';
    const searchParams = new URLSearchParams();

    for (const [key, value] of Object.entries(args)) {
      searchParams.append(key, String(value));
    }

    const url = `http://localhost:37777${endpoint}?${searchParams}`;
    const response = await fetch(url);
    return await response.json();
  }
}

Hybrid search approach

FTS5 keyword search (SQLite)

SQLite FTS5 virtual tables provide fast full-text keyword matching:
-- observations_fts covers: title, subtitle, narrative, text, facts, concepts
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = ?
AND date >= ? AND date <= ?
ORDER BY rank
LIMIT ? OFFSET ?
FTS5 supports phrase matching, boolean operators, and column-scoped queries. Typical query latency is sub-10ms.

Semantic search (ChromaDB)

ChromaDB provides vector embeddings for semantic similarity search — finding conceptually related observations even when exact keywords don’t match. The ChromaSync service (src/services/sync/ChromaSync.ts) manages synchronization between SQLite and ChromaDB.
ChromaDB is optional. When unavailable, search falls back to FTS5 keyword search and SQL LIKE queries.

Search routing

The SessionSearch service (src/services/sqlite/SessionSearch.ts) coordinates search routing:
  • Vector search via ChromaDB is the primary search mechanism
  • FTS5 is maintained for backward compatibility; tables are kept synchronized via triggers
  • Structured filters (type, project, date) are applied as SQL predicates regardless of search mode

The 3-layer progressive disclosure pattern

Design philosophy

Progressive disclosure is a core architectural principle: reveal information at the level of detail actually needed, on demand.

Token efficiency

Traditional RAG

Fetch 20 observations upfront: 10,000–20,000 tokensRelevance: ~10% (only 2 observations actually useful)Waste: 18,000 tokens on irrelevant context

3-layer workflow

Step 1: search (20 results) = ~1,000–2,000 tokensStep 2: filter to 3 relevant IDsStep 3: get_observations (3 IDs) = ~1,500–3,000 tokensTotal: 2,500–5,000 tokens (50–75% savings)

Structural enforcement

The 3-layer pattern is enforced by tool design, not just instructions:
  • You cannot fetch full details without first getting IDs from search
  • You cannot search without seeing the workflow reminder in __IMPORTANT
  • timeline provides a middle ground between index and full details
Before: Progressive disclosure was something Claude had to remember. After: Progressive disclosure is structurally impossible to bypass.

Architecture evolution

Approach: 9 MCP tools with detailed parameter schemasToken cost: ~2,500 tokens in tool definitions per sessionTools:
  • search_observations — Full-text search
  • find_by_type — Filter by type
  • find_by_file — Filter by file
  • find_by_concept — Filter by concept
  • get_recent_context — Recent sessions
  • get_observation — Fetch single observation
  • get_session — Fetch session
  • get_prompt — Fetch prompt
  • help — API documentation
Problems: Overlapping operations, complex parameter schemas, no built-in workflow guidance, high token cost at session start.Code size: ~2,718 lines in mcp-server.ts
Approach: 4 MCP tools following the 3-layer workflowTools:
  1. __IMPORTANT — Workflow guidance (always visible)
  2. search — Step 1 (index)
  3. timeline — Step 2 (context)
  4. get_observations — Step 3 (details)
Benefits: Progressive disclosure is built into tool design, no overlapping operations, simple additionalProperties: true schemas, clear workflow pattern.Code size: ~312 lines in mcp-server.ts (88% reduction)
Earlier versions (v5.4.0–v5.5.0) used a skill-based search approach:
  • Required separate SKILL.md and operations/ files
  • HTTP API called directly via curl from skill instructions
  • Progressive disclosure through skill loading (loaded on-demand)
  • Token savings: ~2,250 tokens per session vs the old MCP approach
Migration: Skill-based search was removed in favor of the streamlined MCP architecture, which provides native MCP protocol integration, cleaner architecture, and works with both Claude Desktop and Claude Code.

Configuration

MCP server is automatically configured via plugin installation. No manual setup required.
Both clients use the same 4 MCP tools — the architecture works identically.

Security

FTS5 injection prevention

All search queries are escaped before FTS5 processing:
function escapeFTS5Query(query: string): string {
  return query.replace(/"/g, '""');
}
The test suite covers 332 injection attack patterns: special characters, SQL keywords, quote escaping, and boolean operators.

MCP protocol security

  • Stdio transport: No network exposure for the MCP protocol
  • Local-only HTTP: Worker API is bound to localhost:37777
  • No authentication: Local development only, no external network access

Performance

AspectDetail
FTS5 query latencySub-10ms for typical queries
MCP overheadMinimal — simple protocol translation only
PaginationEfficient with offset / limit
Batchingget_observations accepts multiple IDs in a single call

Troubleshooting

  1. Verify the MCP server path in your configuration
  2. Check that the worker service is running:
    curl http://localhost:37777/health
    
  3. Restart Claude Desktop or Claude Code
npm run worker:status    # Check status
npm run worker:restart   # Restart worker
npm run worker:logs      # View logs
  1. Test the API directly:
    curl "http://localhost:37777/api/search?query=test"
    
  2. Verify the database exists:
    ls ~/.claude-mem/claude-mem.db
    
  3. Confirm observations exist:
    curl "http://localhost:37777/api/stats"
    

Build docs developers (and LLMs) love