Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Node.js v24+ / TypeScript 5.9+ (strict, ESM) |
| Database | LadybugDB (embedded graph database, single-file storage, Kuzu engine) |
| MCP SDK | @modelcontextprotocol/sdk ^1.27.1 |
| Transports | stdio (CLI agents), HTTP/SSE (network clients) |
| AST parsing | tree-sitter 0.25.0 + language grammars (0.23.x–0.25.x) |
| Native addon | Rust via napi-rs (optional, multi-threaded pass-1) |
| Embeddings | ONNX Runtime (MiniLM 384-dim, nomic-embed-text-v1.5 768-dim) |
| Validation | Zod schemas for all tool payloads and responses |
Architectural Pattern
SDL-MCP follows a hexagonal / ports-and-adapters design. Each module has a clear role and no cross-layer mutations:- Indexer produces pure domain objects (symbols, edges) — owns all writes
- Graph reads from the DB to build slices — no mutations
- Retrieval (
src/retrieval/) orchestrates hybrid search (FTS + vector + RRF fusion). Provides the start-node discovery engine forslice.buildandsymbol.search - Delta reads version pairs and computes diffs on demand — no mutations
- Code reads file content and applies policy gating — no mutations
- DB owns all persistence (queries and mutations are separated by module)
System Architecture
The full system from MCP clients through to the indexer pipeline:Startup Sequence
src/main.ts initializes the system in a strict sequential order. The DB must be ready before tools register, and tools must be registered before the transport accepts connections.
Register repos
ensureConfiguredReposRegistered() — bootstraps configured repositories into the graph.Start live index coordinator
getDefaultLiveIndexCoordinator() — initializes the singleton overlay service for real-time buffer indexing.Register tools
registerTools(server, services) — wires discovery and info tools plus flat, gateway, and/or code-mode tools.Set up file watchers
setupFileWatchers() — starts chokidar watchers for incremental re-indexing on file changes.Register shutdown handlers
ShutdownManager.register(callbacks) — registers graceful cleanup callbacks.Indexing Pipeline
Indexing is triggered bysdl-mcp index (CLI) or sdl.index.refresh (MCP tool) and runs in two passes followed by a finalization stage.
Pass 1: Local Extraction
Per-file, fully parallelizable. Each file is parsed by the selected engine and produces symbols, imports, calls, and SHA-256 fingerprints.- Rust engine (default)
- TypeScript engine (fallback)
The native Rust addon (
native/src/extract/) runs via napi-rs. It mirrors all TypeScript adapters at near-native speed using multi-threaded processing. Select it explicitly with indexing.engine: "rust" in config.src/indexer/adapter/) covering 12 languages:
| Adapter | Languages |
|---|---|
typescript.ts | TypeScript and JavaScript (shared) |
python.ts | Python |
go.ts | Go |
java.ts | Java |
rust.ts | Rust |
csharp.ts | C# |
c.ts | C |
cpp.ts | C++ |
php.ts | PHP |
kotlin.ts | Kotlin |
shell.ts | Shell |
Pass 2: Cross-File Call Resolution
Sequential, cross-file. Resolves raw call identifiers (e.g.,"getUserById") to specific symbol IDs using the pass-2 resolver registry (src/indexer/pass2/registry.ts).
There are 11 language-specific resolvers. Every resolver:
- Builds a repo-wide index (namespace, module, package, or directory-scoped)
- Follows import/use/include/source chains to resolve call targets
- Handles language-specific patterns (generics, traits, templates, extensions, header pairs)
- Assigns stratified confidence scores (same-file 0.93 → imports 0.9 → same-scope 0.88–0.92 → fallback 0.45–0.78)
Finalization
After both passes complete:Cluster detection
Label Propagation Algorithm (Rust addon or TS fallback) groups highly-coupled symbols into named clusters.
Process tracing
Call-chain analysis identifies each symbol’s role — entry, intermediate, or exit — within named processes.
Embedding generation
ONNX models produce vector embeddings for each symbol, enabling semantic search.
LLM summaries (optional)
Generates 1–3 sentence descriptions per symbol via the configured provider (Anthropic, Ollama, or mock). Cached with content-addressed hashing.
Graph DB Storage
LadybugDB
SDL-MCP uses LadybugDB (Kuzu engine, npm aliaskuzu) as its sole persistence layer. The database is stored as a single file on disk (.lbug extension).
DB path resolution (in priority order):
SDL_GRAPH_DB_PATHenvironment variable (or legacySDL_DB_PATH)graphDatabase.pathin config- Default:
<configDir>/sdl-mcp-graph.lbug
| Connection type | Count | Access pattern |
|---|---|---|
| Read connections | 4 (configurable 1–8) | Round-robin, concurrent |
| Write connection | 1 (serialized) | Queued via ConcurrencyLimiter(1) |
Graph Schema
Node tables:| Node Table | Key Fields |
|---|---|
| Repo | repoId, rootPath, configJson, createdAt |
| File | fileId, repoId, relPath, byteSize, contentHash |
| Symbol | symbolId, repoId, fileId, kind, name, exported, signatureJson, summary, summaryQuality, summarySource, etag, embeddingMiniLM, embeddingNomic |
| Version | versionId, repoId, timestamp, indexedAt |
| Cluster | clusterId, label, memberCount, searchText |
| Process | processId, label, repoId, searchText |
| FileSummary | fileId, repoId, summary, searchText, embeddingMiniLM, embeddingNomic |
| SummaryCache | symbolId, summary, provider, model, cardHash, costUsd |
| SliceHandle | handle, createdAt, expiresAt, minVersion, maxVersion |
| AgentFeedback | feedbackId, repoId, taskText, taskType, searchText, embeddingMiniLM, embeddingNomic |
| Edge Table | From → To | Key Fields |
|---|---|---|
| CALLS | Symbol → Symbol | confidence, resolverStrategy, provenance |
| IMPORTS | Symbol → Symbol | importKind, alias |
| DEFINED_IN | Symbol → File | — |
| BELONGS_TO | File → Repo | — |
| BELONGS_TO_CLUSTER | Symbol → Cluster | membershipScore |
| PARTICIPATES_IN | Symbol → Process | stepOrder, role |
Tool Gateway Dispatch
All MCP tools flow through a single dispatch path insrc/server.ts. The tool surface depends on configuration:
| Mode | Tool count | Description |
|---|---|---|
| Flat mode | 33 tools | 31 flat tools + sdl.action.search + sdl.info |
| Gateway-only mode | 6 tools | 4 gateway tools + sdl.action.search + sdl.info |
| Gateway + legacy mode | 37 tools | 4 gateway + 31 legacy flat + sdl.action.search + sdl.info |
| Code Mode | +2 tools | Adds sdl.manual and sdl.chain to any of the above |
sdl.query, sdl.code, sdl.repo, sdl.agent. Each accepts an action discriminator field and routes to the same handlers as flat mode.
Dispatch flow:
Normalization
Before Zod validation, requests pass through a shared normalization layer that accepts camelCase fields and common aliases (
repo_id, root_path, symbol_id, from_version, etc.).Zod validation
The normalized payload is validated against the tool’s Zod schema. Validation failures return an
isError MCP response immediately.Concurrency gate
The
ConcurrencyLimiter allows a maximum of 8 concurrent handlers with a 30-second queue timeout. Requests beyond the limit are queued.Handler execution
Tool-specific handler logic runs and returns a result, optionally attaching
_rawContext hints (file IDs or raw token counts) as sideband metadata.Post-processing
The post-processor computes
_tokenUsage from _rawContext (SDL tokens vs. raw-file equivalent, savings percentage), strips internal fields, and calls logToolCall() for telemetry.Policy Engine
The policy engine gates raw code access (Rung 4 of the Iris Gate Ladder). Agents must provide a reason, identifiers they expect to find, and an expected line count within configured limits. Priority-ordered rules:| Priority | Rule |
|---|---|
| 100 | Hard caps (180 lines maximum) |
| 90 | Identifiers required |
| 80 | Budget enforcement |
| 10 | Break-glass override (with audit trail) |
nextBestAction and the required fields for the suggested alternative:
Transport Layer
- stdio (default)
- HTTP/SSE
Single-session transport for CLI agents (Claude Code, Cursor, Windsurf, etc.). One
MCPServer instance handles all requests for the lifetime of the process.Concurrency Control
| Limiter | Scope | Max | Timeout | Purpose |
|---|---|---|---|---|
| Tool dispatch | Per-server | 8 concurrent | 30s queue | Prevents handler starvation |
| DB write conn | Global | 1 (serialized) | — | Graph integrity |
| DB read pool | Global | 4 connections | — | Concurrent multi-session reads |
| Session manager | Global | 8 sessions | 5 min idle | Resource limits |
| Summary batch | Per-index | 5 concurrent | — | API rate limiting |
ConcurrencyLimiter (src/util/concurrency.ts) — a queue-based limiter reused throughout the system.
Semantic Engine
Three subsystems enhance code intelligence beyond structural analysis.Embedding Search
Two ONNX text models are available. Quality improves when LLM summaries are also enabled.| Model | Dimensions | Size | Notes |
|---|---|---|---|
all-MiniLM-L6-v2 | 384-dim | ~22 MB | Bundled, zero-setup, general-purpose baseline |
nomic-embed-text-v1.5 | 768-dim | ~138 MB | Downloaded on first use, higher quality, 8192-token context |
LLM Summaries
1–3 sentence semantic descriptions generated per symbol. Three providers are supported:- Anthropic
- Ollama / OpenAI-compatible
- Mock
Uses the Anthropic API. Requires
ANTHROPIC_API_KEY. Cached with content-addressed hashing — unchanged symbols are never re-summarized.Pass-2 Call Resolution
11 language-specific resolvers trace import chains and resolve raw call identifiers to symbol IDs with confidence scores (0.0–1.0). See the Semantic Engine deep dive for full details.Graceful Degradation
SDL-MCP degrades gracefully when optional components are unavailable:| Component unavailable | Fallback behavior |
|---|---|
| Rust native indexer | Falls back to tree-sitter TypeScript engine |
| ONNX runtime | Falls back to mock embeddings (text-only search) |
| LLM API | Skips summary generation; uses heuristic descriptions |
| Live index overlay | Reads from persisted DB only |