Search architecture

Claude-Mem uses an MCP-based search architecture that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer progressive disclosure workflow.

Overview

MCP Tools → MCP Protocol → HTTP API → Worker Service → SQLite FTS5 + ChromaDB

MCP tools

4 tools: search, timeline, get_observations, __IMPORTANT

MCP server

Thin wrapper (~312 lines) that translates MCP protocol to HTTP API calls

HTTP API

Fast search operations on Worker Service at port 37777

Hybrid search

SQLite FTS5 for keyword search, ChromaDB for semantic/vector search

Token efficiency: ~10x savings through the 3-layer workflow pattern.

How it works

User query

Claude has 4 MCP tools available. When searching memory, it follows the 3-layer workflow:

Step 1: search(query="authentication bug", type="bugfix", limit=10)
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
Step 3: get_observations(ids=[123, 456, 789])

MCP protocol

The MCP server receives a tool call via JSON-RPC over stdio:

{
  "method": "tools/call",
  "params": {
    "name": "search",
    "arguments": {
      "query": "authentication bug",
      "type": "bugfix",
      "limit": 10
    }
  }
}

HTTP API call

The MCP server translates the call to an HTTP request:

const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
const response = await fetch(url);

Worker processing

The worker service executes the FTS5 query:

SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = 'bugfix'
ORDER BY rank
LIMIT 10

Results returned

The worker returns structured data through the MCP server to Claude:

{
  "content": [{
    "type": "text",
    "text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
  }]
}

Claude processes results

Claude reviews the compact index, decides which observations are relevant, and can use timeline for context or get_observations to fetch full details for specific IDs.

The 4 MCP tools

`__IMPORTANT` — Workflow documentation

Always visible to Claude. Explains the 3-layer workflow pattern and enforces it at the tool level.

3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.

`search` — Search memory index

Step 1 of the workflow. Returns a compact index for filtering.

{
  name: 'search',
  description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true  // Accepts any parameters
  }
}

HTTP endpoint: GET /api/search

Parameter	Description
`query`	Full-text search query
`limit`	Maximum results (default: 20)
`type`	Filter by observation type
`project`	Filter by project name
`dateStart`, `dateEnd`	Date range filters
`offset`	Pagination offset
`orderBy`	Sort order

Returns: Compact index with IDs, titles, dates, types (~50–100 tokens per result).

`timeline` — Get chronological context

Step 2 of the workflow. Reveals the narrative arc around a specific observation.

{
  name: 'timeline',
  description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true
  }
}

HTTP endpoint: GET /api/timeline

Parameter	Description
`anchor`	Observation ID to center timeline around
`query`	Search query to find anchor automatically
`depth_before`	Observations before anchor (default: 3)
`depth_after`	Observations after anchor (default: 3)
`project`	Filter by project name

Either anchor or query must be provided. Returns: Chronological view of what happened before, during, and after the anchor point.

`get_observations` — Fetch full details

Step 3 of the workflow. Fetches complete data only for IDs pre-filtered in steps 1–2.

{
  name: 'get_observations',
  description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
  inputSchema: {
    type: 'object',
    properties: {
      ids: {
        type: 'array',
        items: { type: 'number' },
        description: 'Array of observation IDs to fetch (required)'
      }
    },
    required: ['ids'],
    additionalProperties: true
  }
}

HTTP endpoint: POST /api/observations/batch

{
  "ids": [123, 456, 789],
  "orderBy": "date_desc",
  "project": "my-app"
}

Returns: Complete observation details (~500–1,000 tokens per observation).

MCP server implementation

Location: plugin/scripts/mcp-server.cjs The MCP server is a thin wrapper — it contains no business logic. Its sole job is protocol translation from MCP JSON-RPC to HTTP API calls. Key characteristics:

~312 lines of code (reduced from ~2,718 lines in the previous implementation)
Single source of truth: the Worker HTTP API
Simple schemas with additionalProperties: true

Handler pattern:

{
  name: 'search',
  handler: async (args: any) => {
    const endpoint = '/api/search';
    const searchParams = new URLSearchParams();

    for (const [key, value] of Object.entries(args)) {
      searchParams.append(key, String(value));
    }

    const url = `http://localhost:37777${endpoint}?${searchParams}`;
    const response = await fetch(url);
    return await response.json();
  }
}

Hybrid search approach

FTS5 keyword search (SQLite)

SQLite FTS5 virtual tables provide fast full-text keyword matching:

-- observations_fts covers: title, subtitle, narrative, text, facts, concepts
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = ?
AND date >= ? AND date <= ?
ORDER BY rank
LIMIT ? OFFSET ?

FTS5 supports phrase matching, boolean operators, and column-scoped queries. Typical query latency is sub-10ms.

Semantic search (ChromaDB)

ChromaDB provides vector embeddings for semantic similarity search — finding conceptually related observations even when exact keywords don’t match. The ChromaSync service (src/services/sync/ChromaSync.ts) manages synchronization between SQLite and ChromaDB.

ChromaDB is optional. When unavailable, search falls back to FTS5 keyword search and SQL LIKE queries.

Search routing

The SessionSearch service (src/services/sqlite/SessionSearch.ts) coordinates search routing:

Vector search via ChromaDB is the primary search mechanism
FTS5 is maintained for backward compatibility; tables are kept synchronized via triggers
Structured filters (type, project, date) are applied as SQL predicates regardless of search mode

The 3-layer progressive disclosure pattern

Design philosophy

Progressive disclosure is a core architectural principle: reveal information at the level of detail actually needed, on demand.

Layer 1: Index (search)
Layer 2: Context (timeline)
Layer 3: Details (get_observations)

What: Compact table with IDs, titles, dates, typesCost: ~50–100 tokens per resultPurpose: Survey what exists before committing tokensDecision point: “Which observations are relevant?”

| ID  | Time     | Title                      | Type    |
|-----|----------|----------------------------|---------|
| 123 | 2:15 PM  | Fixed auth token expiry    | bugfix  |
| 456 | 3:42 PM  | Added OAuth refresh flow   | feature |

Token efficiency

Traditional RAG

Fetch 20 observations upfront: 10,000–20,000 tokensRelevance: ~10% (only 2 observations actually useful)Waste: 18,000 tokens on irrelevant context

3-layer workflow

Step 1: search (20 results) = ~1,000–2,000 tokensStep 2: filter to 3 relevant IDsStep 3: get_observations (3 IDs) = ~1,500–3,000 tokensTotal: 2,500–5,000 tokens (50–75% savings)

Structural enforcement

The 3-layer pattern is enforced by tool design, not just instructions:

You cannot fetch full details without first getting IDs from search
You cannot search without seeing the workflow reminder in __IMPORTANT
timeline provides a middle ground between index and full details

Before: Progressive disclosure was something Claude had to remember. After: Progressive disclosure is structurally impossible to bypass.

Architecture evolution

Before: Complex MCP implementation (9 tools)

Approach: 9 MCP tools with detailed parameter schemasToken cost: ~2,500 tokens in tool definitions per sessionTools:

search_observations — Full-text search
find_by_type — Filter by type
find_by_file — Filter by file
find_by_concept — Filter by concept
get_recent_context — Recent sessions
get_observation — Fetch single observation
get_session — Fetch session
get_prompt — Fetch prompt
help — API documentation

Problems: Overlapping operations, complex parameter schemas, no built-in workflow guidance, high token cost at session start.Code size: ~2,718 lines in mcp-server.ts

After: Streamlined MCP implementation (4 tools)

Approach: 4 MCP tools following the 3-layer workflowTools:

__IMPORTANT — Workflow guidance (always visible)
search — Step 1 (index)
timeline — Step 2 (context)
get_observations — Step 3 (details)

Benefits: Progressive disclosure is built into tool design, no overlapping operations, simple additionalProperties: true schemas, clear workflow pattern.Code size: ~312 lines in mcp-server.ts (88% reduction)

Previous: Skill-based approach

Earlier versions (v5.4.0–v5.5.0) used a skill-based search approach:

Required separate SKILL.md and operations/ files
HTTP API called directly via curl from skill instructions
Progressive disclosure through skill loading (loaded on-demand)
Token savings: ~2,250 tokens per session vs the old MCP approach

Migration: Skill-based search was removed in favor of the streamlined MCP architecture, which provides native MCP protocol integration, cleaner architecture, and works with both Claude Desktop and Claude Code.

Configuration

Claude Code
Claude Desktop

MCP server is automatically configured via plugin installation. No manual setup required.

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mcp-search": {
      "command": "node",
      "args": [
        "/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
      ]
    }
  }
}

Both clients use the same 4 MCP tools — the architecture works identically.

Security

FTS5 injection prevention

All search queries are escaped before FTS5 processing:

function escapeFTS5Query(query: string): string {
  return query.replace(/"/g, '""');
}

The test suite covers 332 injection attack patterns: special characters, SQL keywords, quote escaping, and boolean operators.

MCP protocol security

Stdio transport: No network exposure for the MCP protocol
Local-only HTTP: Worker API is bound to localhost:37777
No authentication: Local development only, no external network access

Performance

Aspect	Detail
FTS5 query latency	Sub-10ms for typical queries
MCP overhead	Minimal — simple protocol translation only
Pagination	Efficient with `offset` / `limit`
Batching	`get_observations` accepts multiple IDs in a single call

Troubleshooting

MCP server not connected (tools not appearing in Claude)

Verify the MCP server path in your configuration
Check that the worker service is running:
```
curl http://localhost:37777/health
```
Restart Claude Desktop or Claude Code

Worker service not running (MCP tools fail with connection errors)

npm run worker:status    # Check status
npm run worker:restart   # Restart worker
npm run worker:logs      # View logs

Empty search results

Test the API directly:

curl "http://localhost:37777/api/search?query=test"

Verify the database exists:
```
ls ~/.claude-mem/claude-mem.db
```

Confirm observations exist:

curl "http://localhost:37777/api/stats"

Worker Service — HTTP API endpoint reference
Database Architecture — FTS5 tables, indexes, and schema
Architecture Overview — System components and data flow

System Design

Hooks & Lifecycle

Evolution

Search Architecture

Search architecture

Overview

MCP tools

MCP server

HTTP API

Hybrid search

How it works

The 4 MCP tools

`__IMPORTANT` — Workflow documentation

`search` — Search memory index

`timeline` — Get chronological context

`get_observations` — Fetch full details

MCP server implementation

Hybrid search approach

FTS5 keyword search (SQLite)

Semantic search (ChromaDB)

Search routing

The 3-layer progressive disclosure pattern

Design philosophy

Token efficiency

Traditional RAG

3-layer workflow

Structural enforcement

Architecture evolution

Configuration

Security

FTS5 injection prevention

MCP protocol security

Performance

Troubleshooting

Build docs developers (and LLMs) love

System Design

Hooks & Lifecycle

Evolution

​Search architecture

​Overview

MCP tools

MCP server

HTTP API

Hybrid search

​How it works

​The 4 MCP tools

​__IMPORTANT — Workflow documentation

​search — Search memory index

​timeline — Get chronological context

​get_observations — Fetch full details

​MCP server implementation

​Hybrid search approach

​FTS5 keyword search (SQLite)

​Semantic search (ChromaDB)

​Search routing

​The 3-layer progressive disclosure pattern

​Design philosophy

​Token efficiency

Traditional RAG

3-layer workflow

​Structural enforcement

​Architecture evolution

​Configuration

​Security

​FTS5 injection prevention

​MCP protocol security

​Performance

​Troubleshooting

​Related pages

Build docs developers (and LLMs) love

Search architecture

Overview

How it works

The 4 MCP tools

`__IMPORTANT` — Workflow documentation

`search` — Search memory index

`timeline` — Get chronological context

`get_observations` — Fetch full details

MCP server implementation

Hybrid search approach

FTS5 keyword search (SQLite)

Semantic search (ChromaDB)

Search routing

The 3-layer progressive disclosure pattern

Design philosophy

Token efficiency

Structural enforcement

Architecture evolution

Configuration

Security

FTS5 injection prevention

MCP protocol security

Performance

Troubleshooting

Related pages