Skip to main content
SDL-MCP is a high-performance codebase indexing and context retrieval server. It indexes a repository into a searchable symbol graph and serves precisely-scoped context to AI coding agents through a controlled escalation path.

Tech Stack

LayerTechnology
RuntimeNode.js v24+ / TypeScript 5.9+ (strict, ESM)
DatabaseLadybugDB (embedded graph database, single-file storage, Kuzu engine)
MCP SDK@modelcontextprotocol/sdk ^1.27.1
Transportsstdio (CLI agents), HTTP/SSE (network clients)
AST parsingtree-sitter 0.25.0 + language grammars (0.23.x–0.25.x)
Native addonRust via napi-rs (optional, multi-threaded pass-1)
EmbeddingsONNX Runtime (MiniLM 384-dim, nomic-embed-text-v1.5 768-dim)
ValidationZod schemas for all tool payloads and responses

Architectural Pattern

SDL-MCP follows a hexagonal / ports-and-adapters design. Each module has a clear role and no cross-layer mutations:
  • Indexer produces pure domain objects (symbols, edges) — owns all writes
  • Graph reads from the DB to build slices — no mutations
  • Retrieval (src/retrieval/) orchestrates hybrid search (FTS + vector + RRF fusion). Provides the start-node discovery engine for slice.build and symbol.search
  • Delta reads version pairs and computes diffs on demand — no mutations
  • Code reads file content and applies policy gating — no mutations
  • DB owns all persistence (queries and mutations are separated by module)
                   ┌──────────────────────────────────┐
                   │         MCP Tool Layer           │
                   │  (tools.ts, server.ts, coord.ts) │
                   └────────────┬─────────────────────┘

       ┌────────────────────────┼────────────────────────┐
       │                        │                        │
┌──────▼──────┐          ┌──────▼──────┐          ┌──────▼──────┐
│   Indexer    │          │    Graph    │          │    Code     │
│ (write path) │          │ (read path) │          │ (read path) │
│              │          │             │          │             │
│ Pass-1 + 2   │          │ Slice build │          │ Skeleton    │
│ Clusters     │          │ Beam search │          │ Hot-path    │
│ Processes    │          │ Spillover   │          │ Windows     │
│ Summaries    │          │ Card cache  │          │ Gate/policy │
└──────┬──────┘          └──────┬──────┘          └──────┬──────┘
       │                        │                        │
       └────────────────────────┼────────────────────────┘

                   ┌────────────▼─────────────────────┐
                   │       LadybugDB (Graph DB)       │
                   │  Symbols, Edges, Files, Repos,   │
                   │  Clusters, Processes, Versions,   │
                   │  Embeddings, Summaries, Feedback, │
                   │  FileSummaries, Memories          │
                   │  FTS + Vector Indexes             │
                   └──────────────────────────────────┘

System Architecture

The full system from MCP clients through to the indexer pipeline:
┌─────────────────────────────────────────────────────────────────────┐
│                        MCP Clients                                  │
│  Claude Code · Claude Desktop · Cursor · Windsurf · Codex · Gemini │
└──────────────────────────┬──────────────────────────────────────────┘
                           │ stdio / HTTP
┌──────────────────────────▼──────────────────────────────────────────┐
│                      Tool Gateway                                    │
│  4 namespace-scoped tools (sdl.query, sdl.code, sdl.repo, sdl.agent) │
│  ← Thin JSON schemas → Double Zod validation → Handler dispatch      │
└───────┬────────┬────────┬────────┬────────┬────────┬────────────────┘
        │        │        │        │        │        │
┌───────▼──┐ ┌───▼───┐ ┌─▼──┐ ┌──▼──┐ ┌──▼───┐ ┌──▼──────┐
│ Symbols  │ │Slices │ │Code│ │Delta│ │Agent │ │ Memory  │
│ search   │ │build  │ │gate│ │diff │ │orch. │ │ store   │
│ getCard  │ │refresh│ │skel│ │blast│ │feedbk│ │ query   │
│ getCards │ │spill. │ │hot │ │risk │ │chain │ │ surface │
└────┬─────┘ └───┬───┘ └─┬──┘ └──┬──┘ └──┬───┘ └──┬─────┘
     │           │       │       │       │        │
┌────▼───────────▼───────▼───────▼───────▼────────▼───────┐
│                    Policy Engine                          │
│  Proof-of-need gating · Token budgets · Audit logging     │
└──────────────────────────┬──────────────────────────────┘

┌──────────────────────────▼──────────────────────────────┐
│                   LadybugDB (Graph)                      │
│  Symbols · Edges · Files · Versions · Clusters ·         │
│  Processes · Memories · Metrics                           │
└──────────────────────────▲──────────────────────────────┘

┌──────────────────────────┴──────────────────────────────┐
│                  Indexer Pipeline                         │
│  ┌─────────────────┐    ┌────────────────────────────┐   │
│  │  Rust (napi-rs)  │ or │  Tree-sitter (TS fallback) │   │
│  │  default engine  │    │  11 language grammars       │   │
│  └────────┬────────┘    └──────────┬─────────────────┘   │
│           │  Pass 1: Symbols + Imports + Calls            │
│           │  Pass 2: Cross-file call resolution            │
│           │  Semantic: Embeddings + LLM summaries          │
└───────────┴──────────────────────────────────────────────┘

Startup Sequence

src/main.ts initializes the system in a strict sequential order. The DB must be ready before tools register, and tools must be registered before the transport accepts connections.
1

Load config

loadConfig() — reads the config file and validates it with Zod schemas.
2

Open database

initGraphDb() — opens or creates the LadybugDB file at the configured path.
3

Register repos

ensureConfiguredReposRegistered() — bootstraps configured repositories into the graph.
4

Start live index coordinator

getDefaultLiveIndexCoordinator() — initializes the singleton overlay service for real-time buffer indexing.
5

Register tools

registerTools(server, services) — wires discovery and info tools plus flat, gateway, and/or code-mode tools.
6

Set up file watchers

setupFileWatchers() — starts chokidar watchers for incremental re-indexing on file changes.
7

Register shutdown handlers

ShutdownManager.register(callbacks) — registers graceful cleanup callbacks.
8

Start server

server.start() — begins accepting MCP requests over the configured transport.

Indexing Pipeline

Indexing is triggered by sdl-mcp index (CLI) or sdl.index.refresh (MCP tool) and runs in two passes followed by a finalization stage.

Pass 1: Local Extraction

Per-file, fully parallelizable. Each file is parsed by the selected engine and produces symbols, imports, calls, and SHA-256 fingerprints.
The native Rust addon (native/src/extract/) runs via napi-rs. It mirrors all TypeScript adapters at near-native speed using multi-threaded processing. Select it explicitly with indexing.engine: "rust" in config.
Supported languages — 11 language adapters (src/indexer/adapter/) covering 12 languages:
AdapterLanguages
typescript.tsTypeScript and JavaScript (shared)
python.tsPython
go.tsGo
java.tsJava
rust.tsRust
csharp.tsC#
c.tsC
cpp.tsC++
php.tsPHP
kotlin.tsKotlin
shell.tsShell

Pass 2: Cross-File Call Resolution

Sequential, cross-file. Resolves raw call identifiers (e.g., "getUserById") to specific symbol IDs using the pass-2 resolver registry (src/indexer/pass2/registry.ts). There are 11 language-specific resolvers. Every resolver:
  • Builds a repo-wide index (namespace, module, package, or directory-scoped)
  • Follows import/use/include/source chains to resolve call targets
  • Handles language-specific patterns (generics, traits, templates, extensions, header pairs)
  • Assigns stratified confidence scores (same-file 0.93 → imports 0.9 → same-scope 0.88–0.92 → fallback 0.45–0.78)
Each resolved edge includes:
{
  "targetSymbolId": "abc123",
  "confidence": 0.92,
  "strategy": "import-alias",
  "provenance": "getUserById → import {getUserById} → src/db/users.ts::getUserById"
}

Finalization

After both passes complete:
1

Cluster detection

Label Propagation Algorithm (Rust addon or TS fallback) groups highly-coupled symbols into named clusters.
2

Process tracing

Call-chain analysis identifies each symbol’s role — entry, intermediate, or exit — within named processes.
3

Embedding generation

ONNX models produce vector embeddings for each symbol, enabling semantic search.
4

LLM summaries (optional)

Generates 1–3 sentence descriptions per symbol via the configured provider (Anthropic, Ollama, or mock). Cached with content-addressed hashing.
5

Version bump

A new ledger version is recorded in the graph, enabling delta computation and blast radius analysis.

Graph DB Storage

LadybugDB

SDL-MCP uses LadybugDB (Kuzu engine, npm alias kuzu) as its sole persistence layer. The database is stored as a single file on disk (.lbug extension). DB path resolution (in priority order):
  1. SDL_GRAPH_DB_PATH environment variable (or legacy SDL_DB_PATH)
  2. graphDatabase.path in config
  3. Default: <configDir>/sdl-mcp-graph.lbug
Connection pool: The pool separates read and write connections to maximize throughput while preventing graph corruption:
Connection typeCountAccess pattern
Read connections4 (configurable 1–8)Round-robin, concurrent
Write connection1 (serialized)Queued via ConcurrencyLimiter(1)

Graph Schema

Node tables:
Node TableKey Fields
ReporepoId, rootPath, configJson, createdAt
FilefileId, repoId, relPath, byteSize, contentHash
SymbolsymbolId, repoId, fileId, kind, name, exported, signatureJson, summary, summaryQuality, summarySource, etag, embeddingMiniLM, embeddingNomic
VersionversionId, repoId, timestamp, indexedAt
ClusterclusterId, label, memberCount, searchText
ProcessprocessId, label, repoId, searchText
FileSummaryfileId, repoId, summary, searchText, embeddingMiniLM, embeddingNomic
SummaryCachesymbolId, summary, provider, model, cardHash, costUsd
SliceHandlehandle, createdAt, expiresAt, minVersion, maxVersion
AgentFeedbackfeedbackId, repoId, taskText, taskType, searchText, embeddingMiniLM, embeddingNomic
Edge tables:
Edge TableFrom → ToKey Fields
CALLSSymbol → Symbolconfidence, resolverStrategy, provenance
IMPORTSSymbol → SymbolimportKind, alias
DEFINED_INSymbol → File
BELONGS_TOFile → Repo
BELONGS_TO_CLUSTERSymbol → ClustermembershipScore
PARTICIPATES_INSymbol → ProcessstepOrder, role

Tool Gateway Dispatch

All MCP tools flow through a single dispatch path in src/server.ts. The tool surface depends on configuration:
ModeTool countDescription
Flat mode33 tools31 flat tools + sdl.action.search + sdl.info
Gateway-only mode6 tools4 gateway tools + sdl.action.search + sdl.info
Gateway + legacy mode37 tools4 gateway + 31 legacy flat + sdl.action.search + sdl.info
Code Mode+2 toolsAdds sdl.manual and sdl.chain to any of the above
The 4 gateway tools are namespace-scoped: sdl.query, sdl.code, sdl.repo, sdl.agent. Each accepts an action discriminator field and routes to the same handlers as flat mode. Dispatch flow:
1

Normalization

Before Zod validation, requests pass through a shared normalization layer that accepts camelCase fields and common aliases (repo_id, root_path, symbol_id, from_version, etc.).
2

Zod validation

The normalized payload is validated against the tool’s Zod schema. Validation failures return an isError MCP response immediately.
3

Concurrency gate

The ConcurrencyLimiter allows a maximum of 8 concurrent handlers with a 30-second queue timeout. Requests beyond the limit are queued.
4

Handler execution

Tool-specific handler logic runs and returns a result, optionally attaching _rawContext hints (file IDs or raw token counts) as sideband metadata.
5

Post-processing

The post-processor computes _tokenUsage from _rawContext (SDL tokens vs. raw-file equivalent, savings percentage), strips internal fields, and calls logToolCall() for telemetry.
6

Response

The result is wrapped in the MCP content format and returned to the client.

Policy Engine

The policy engine gates raw code access (Rung 4 of the Iris Gate Ladder). Agents must provide a reason, identifiers they expect to find, and an expected line count within configured limits. Priority-ordered rules:
PriorityRule
100Hard caps (180 lines maximum)
90Identifiers required
80Budget enforcement
10Break-glass override (with audit trail)
Denied requests include an actionable nextBestAction and the required fields for the suggested alternative:
{
  "error": {
    "message": "Window exceeds 180 line limit",
    "code": "POLICY_ERROR",
    "nextBestAction": "requestSkeleton",
    "requiredFieldsForNext": { "symbolId": "sym-1", "repoId": "repo-1" }
  }
}

Transport Layer

Single-session transport for CLI agents (Claude Code, Cursor, Windsurf, etc.). One MCPServer instance handles all requests for the lifetime of the process.
sdl-mcp serve --stdio

Concurrency Control

LimiterScopeMaxTimeoutPurpose
Tool dispatchPer-server8 concurrent30s queuePrevents handler starvation
DB write connGlobal1 (serialized)Graph integrity
DB read poolGlobal4 connectionsConcurrent multi-session reads
Session managerGlobal8 sessions5 min idleResource limits
Summary batchPer-index5 concurrentAPI rate limiting
All limiters use the generic ConcurrencyLimiter (src/util/concurrency.ts) — a queue-based limiter reused throughout the system.

Semantic Engine

Three subsystems enhance code intelligence beyond structural analysis. Two ONNX text models are available. Quality improves when LLM summaries are also enabled.
ModelDimensionsSizeNotes
all-MiniLM-L6-v2384-dim~22 MBBundled, zero-setup, general-purpose baseline
nomic-embed-text-v1.5768-dim~138 MBDownloaded on first use, higher quality, 8192-token context
Retrieval blends lexical (FTS) and vector similarity scores using Reciprocal Rank Fusion (RRF).

LLM Summaries

1–3 sentence semantic descriptions generated per symbol. Three providers are supported:
Uses the Anthropic API. Requires ANTHROPIC_API_KEY. Cached with content-addressed hashing — unchanged symbols are never re-summarized.

Pass-2 Call Resolution

11 language-specific resolvers trace import chains and resolve raw call identifiers to symbol IDs with confidence scores (0.0–1.0). See the Semantic Engine deep dive for full details.

Graceful Degradation

SDL-MCP degrades gracefully when optional components are unavailable:
Component unavailableFallback behavior
Rust native indexerFalls back to tree-sitter TypeScript engine
ONNX runtimeFalls back to mock embeddings (text-only search)
LLM APISkips summary generation; uses heuristic descriptions
Live index overlayReads from persisted DB only

Source Directory Map

src/
├── main.ts                    Server entry point + bootstrap
├── server.ts                  MCPServer class + tool dispatch
├── cli/
│   ├── commands/              CLI commands (13: init, doctor, info, index, serve,
│   │                            version, export, import, pull, benchmark:ci,
│   │                            summary, health, tool)
│   └── transport/             stdio + HTTP transport setup
├── config/
│   └── types.ts               Zod config schemas
├── db/
│   ├── initGraphDb.ts         DB path resolution + initialization
│   ├── ladybug-schema.ts      Idempotent Cypher DDL
│   └── ladybug-*.ts           Per-domain query modules
├── domain/
│   ├── types.ts               Canonical domain types (SymbolCard, GraphSlice, etc.)
│   └── errors.ts              Typed error hierarchy
├── indexer/
│   ├── indexer.ts             Main indexing orchestrator
│   ├── adapter/               Language adapters (11 adapters, 12 languages)
│   ├── pass2/                 Cross-file resolvers (11 resolvers)
│   ├── import-resolution/     Import chain analysis
│   ├── embeddings.ts          ONNX embedding pipeline
│   ├── summary-generator.ts   LLM summary providers
│   └── watcher.ts             File system monitoring (chokidar)
├── graph/
│   └── slice/                 Beam search, serializer, start-node resolver
├── delta/
│   ├── diff.ts                Version diff computation
│   └── blastRadius.ts         Impact analysis
├── code/
│   ├── skeleton.ts            Deterministic code outline
│   ├── hotpath.ts             Identifier-filtered excerpts
│   ├── gate.ts                Proof-of-need gating
│   └── windows.ts             Raw code extraction
├── code-mode/
│   ├── chain-*.ts             Multi-step tool chaining (sdl.chain)
│   ├── manual-generator.ts    Self-documentation (sdl.manual)
│   ├── action-catalog.ts      Action discovery (sdl.action.search)
│   └── ladder-validator.ts    Context ladder validation
├── gateway/
│   ├── router.ts              Namespace-scoped tool routing
│   ├── thin-schemas.ts        Compact gateway schemas
│   └── compact-schema.ts      Schema size optimization
├── policy/
│   └── engine.ts              Rule-based decision engine
├── live-index/
│   ├── overlay-store.ts       In-memory draft storage
│   ├── coordinator.ts         Parse queue + reconciliation
│   ├── checkpoint-service.ts  Persist drafts to DB
│   └── idle-monitor.ts        Auto-checkpoint on idle
├── memory/
│   ├── surface.ts             Auto-surface memories in slices
│   └── file-sync.ts           .sdl-memory/ file read/write/scan
├── runtime/
│   ├── executor.ts            Sandboxed code execution
│   └── runtimes.ts            Runtime definitions (node, python, shell, ...)
└── mcp/
    ├── tools/                 Handler implementations
    ├── errors.ts              Error-to-MCP response conversion
    ├── telemetry.ts           Tool call logging
    ├── token-usage.ts         Sideband token accounting
    ├── session-manager.ts     Multi-session lifecycle
    └── dispatch-limiter.ts    Concurrency gate (singleton)

Build docs developers (and LLMs) love