Semantic Setup

SDL-MCP’s semantic engine powers embedding-based symbol search and LLM-generated natural-language summaries. This guide walks through selecting a quality tier, configuring a summary provider, and verifying the setup.

Quality Tiers

The summary system has three quality tiers. Each builds on the previous one:

Tier	Embedding Model	Summaries	Search Quality	Cost
Low (default)	`all-MiniLM-L6-v2` (384-dim)	None	Baseline	Free
Medium	`nomic-embed-text-v1.5` (768-dim)	None	Better	Free
High	Either model	LLM-generated	Best	API cost

Low — embeds raw symbol text (name + kind + signature) with a general-purpose model. No API calls. The all-MiniLM-L6-v2 model is bundled with the npm package (~22 MB).
Medium — swaps in a higher-quality text embedding model with a longer context window (8,192 tokens vs 256). Fully offline. Downloads ~138 MB on first use.
High — adds LLM-generated natural-language summaries to either embedding model. Both models are text-based and benefit equally from summaries. This produces the best search results because the LLM distills code meaning into plain English that embedding models handle well.

Semantic search (semantic.enabled) is true by default with provider: "local" using the ONNX runtime. Keep generateSummaries disabled until you validate summary quality for your repository.

Upgrading from all-MiniLM to nomic-embed-text-v1.5

To move from Low to Medium tier, change the model and run a full re-index:

{
  "semantic": {
    "enabled": true,
    "provider": "local",
    "model": "nomic-embed-text-v1.5"
  }
}

Then run a full re-index (required because embedding dimensions change from 384 to 768):

npx sdl-mcp index --repo my-repo --mode full

On first use, nomic-embed-text-v1.5 is downloaded automatically from HuggingFace (~138 MB). To pre-download it:

node scripts/download-models.mjs nomic-embed-text-v1.5

The model is cached at:

Platform	Path
Windows	`%LOCALAPPDATA%\sdl-mcp\models\nomic-embed-text-v1.5\`
macOS	`~/.cache/sdl-mcp/models/nomic-embed-text-v1.5/`
Linux	`~/.cache/sdl-mcp/models/nomic-embed-text-v1.5/`
Custom	Set `semantic.modelCacheDir` in config

Summary Providers

Summary generation is independent from the embedding provider. You can use local embeddings with an API-based summary provider, or vice versa.

Anthropic API
Ollama (Local)
OpenAI-Compatible Servers
Mock (Testing / CI)

Uses Claude models via the Anthropic Messages API. Highest quality, no local GPU needed.Recommended models:

Model	Speed	Quality	Pricing
`claude-haiku-4-5-20251001`	Fast	Good (default)	$0.25 /$ 1.25 per 1M tokens
`claude-sonnet-4-20250514`	Medium	Higher	$3 /$ 15 per 1M tokens

For most repositories, Haiku is the best balance of cost and quality. Each symbol uses roughly 50–150 input tokens.

Get an API key

Set the API key

Set via environment variable (recommended for shared configs):

# Linux / macOS
export ANTHROPIC_API_KEY="sk-ant-your-key-here"

# Windows (persist across terminals)
setx ANTHROPIC_API_KEY "sk-ant-your-key-here"

Or set inline in config (not recommended for shared configs):

{
  "semantic": {
    "summaryApiKey": "sk-ant-your-key-here"
  }
}

Configure

{
  "semantic": {
    "enabled": true,
    "provider": "local",
    "model": "all-MiniLM-L6-v2",
    "generateSummaries": true,
    "summaryProvider": "api",
    "summaryModel": "claude-haiku-4-5-20251001"
  }
}

Key resolution order: summaryApiKey in config → ANTHROPIC_API_KEY env var. If neither is set, summary generation is skipped and existing cached summaries are preserved.

Index your repository

npx sdl-mcp index --repo my-repo

The indexer reports summary stats at the end:

[summaries] Generated 312 summaries, 535 cached, 0 failed ($0.62)

Highest quality configuration (nomic embeddings + Anthropic summaries):

{
  "semantic": {
    "enabled": true,
    "provider": "local",
    "model": "nomic-embed-text-v1.5",
    "generateSummaries": true,
    "summaryProvider": "api",
    "summaryModel": "claude-haiku-4-5-20251001"
  }
}

Requires ANTHROPIC_API_KEY. Downloads ~138 MB embedding model on first run.

Uses any model served via an OpenAI-compatible /v1/chat/completions endpoint. Free, private, runs on your machine.Recommended models for code summaries:

Model	Size	RAM needed	Quality	Notes
`llama3.2`	3B	~2 GB	Good	Fastest, fine for simple codebases
`qwen2.5-coder`	7B	~4.5 GB	Better	Trained on code, understands patterns well
`llama3.1`	8B	~5 GB	Better	Strong general-purpose reasoning
`deepseek-coder-v2`	16B	~9 GB	Best	Best code understanding, needs more RAM

Install Ollama

Download from ollama.com/download:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
winget install Ollama.Ollama

Pull a model and start the server

# Pull a model
ollama pull qwen2.5-coder   # 7B params, code-focused (~4.5 GB)
# or
ollama pull llama3.2        # 3B params, fast, low RAM (~2 GB)

# Start the Ollama server (runs on port 11434 by default)
ollama serve

Configure

{
  "semantic": {
    "enabled": true,
    "provider": "local",
    "model": "all-MiniLM-L6-v2",
    "generateSummaries": true,
    "summaryProvider": "local",
    "summaryModel": "qwen2.5-coder",
    "summaryApiBaseUrl": "http://localhost:11434/v1"
  }
}

The default base URL is already http://localhost:11434/v1, so summaryApiBaseUrl can be omitted when Ollama runs on the same machine with default settings.

Index your repository

npx sdl-mcp index --repo my-repo

Any server implementing the /v1/chat/completions endpoint works. Configure summaryProvider: "local" and point summaryApiBaseUrl at your server.

Server	Base URL	Notes
Ollama	`http://localhost:11434/v1`	Default, no auth needed
LM Studio	`http://localhost:1234/v1`	GUI-based, easy model management
vLLM	`http://localhost:8000/v1`	High-throughput production serving
LocalAI	`http://localhost:8080/v1`	Drop-in OpenAI replacement
OpenAI API	`https://api.openai.com/v1`	Set `summaryApiKey` to your OpenAI key

{
  "semantic": {
    "enabled": true,
    "generateSummaries": true,
    "summaryProvider": "local",
    "summaryModel": "your-model-name",
    "summaryApiBaseUrl": "http://your-server:port/v1",
    "summaryApiKey": "your-key-if-needed"
  }
}

The summaryProvider: "local" value sends OpenAI Chat Completions format (Authorization: Bearer header) to the configured base URL.

Generates deterministic heuristic summaries without any API calls. Useful for testing your config or running in CI pipelines.

{
  "semantic": {
    "enabled": true,
    "generateSummaries": true,
    "summaryProvider": "mock"
  }
}

Mock summaries are stable across runs (same input → same output) and have zero API cost. They’re lower quality than real LLM summaries but suitable for validating that the pipeline is wired correctly.

Tuning Batch Processing

Summary generation processes symbols in parallel batches. Adjust these settings based on your provider’s rate limits and hardware:

Setting	Default	Range	Description
`summaryBatchSize`	`20`	1–50	Symbols processed per batch
`summaryMaxConcurrency`	`5`	1–20	Batches running in parallel

For Anthropic API — defaults are fine. Lower summaryMaxConcurrency to 3 if you hit rate limits on a free-tier key. For Ollama on CPU — set summaryMaxConcurrency: 1 and summaryBatchSize: 10 to avoid overwhelming your machine. For Ollama on GPU — defaults are fine. Increase summaryMaxConcurrency to 8–10 if your GPU has headroom.

{
  "semantic": {
    "generateSummaries": true,
    "summaryProvider": "local",
    "summaryModel": "qwen2.5-coder",
    "summaryMaxConcurrency": 1,
    "summaryBatchSize": 10
  }
}

Verifying Summaries After Indexing

After indexing with summaries enabled, verify that summaries appear in symbol cards:

sdl.symbol.search({ repoId: "my-repo", query: "handleRequest", limit: 1 })
sdl.symbol.getCard({ repoId: "my-repo", symbolId: "<id-from-search>" })

The card’s summary field should contain a natural-language description instead of a heuristic placeholder. The index output also reports summary stats:

[indexing] Extracted 847 symbols from 92 files
[summaries] Generated 312 summaries, 535 cached, 0 failed ($0.62)
[embeddings] Computed 847 embeddings (all-MiniLM-L6-v2)

Check summarySource on cards to see how each summary was produced:

`summarySource`	`summaryQuality`	Meaning
`"jsdoc"`	`1.0`	Extracted from JSDoc / doc comment
`"llm"`	`0.8`	LLM-generated (Claude Haiku, Ollama)
`"nn-direct:<id>"`	`0.6`	Transferred from a similar symbol (similarity ≥ 0.85)
`"nn-adapted:<id>"`	`0.5`	Adapted from a similar symbol (similarity 0.70–0.85)
`"heuristic-typed"`	`0.4`	Pattern-matched from name + param types
`"heuristic-fallback"`	`0.3`	Pattern-matched from name + kind only

Quick Reference: Copy-Paste Configs

{
  "semantic": {
    "enabled": true,
    "provider": "local",
    "model": "all-MiniLM-L6-v2",
    "generateSummaries": true,
    "summaryProvider": "api",
    "summaryModel": "claude-haiku-4-5-20251001"
  }
}

Get Started

Core Concepts

Guides

Support

Quality Tiers

Upgrading from all-MiniLM to nomic-embed-text-v1.5

Summary Providers

Tuning Batch Processing

Verifying Summaries After Indexing

Quick Reference: Copy-Paste Configs

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Support

​Quality Tiers

​Upgrading from all-MiniLM to nomic-embed-text-v1.5

​Summary Providers

​Tuning Batch Processing

​Verifying Summaries After Indexing

​Quick Reference: Copy-Paste Configs

Build docs developers (and LLMs) love

Quality Tiers

Upgrading from all-MiniLM to nomic-embed-text-v1.5

Summary Providers

Tuning Batch Processing

Verifying Summaries After Indexing

Quick Reference: Copy-Paste Configs