Skip to main content
GitNexus builds a complete knowledge graph of your codebase using KuzuDB, an embedded graph database. Every symbol, relationship, cluster, and execution flow is stored as nodes and edges with confidence scoring.

Graph Schema

The knowledge graph consists of nodes representing code entities and edges representing relationships between them.

Node Types

GitNexus creates several types of nodes during indexing:

Code Structure Nodes

File

Represents a source file in the repository

Folder

Directory in the file tree

Module/Package

Language-specific module or package

Symbol Nodes

Function

Top-level function definitions

Class

Class definitions (OOP languages)

Method

Methods within classes

Interface

TypeScript/Java interfaces, Go interfaces, etc.

Enum

Enumeration types

Variable

Module-level variables and constants

Multi-Language Nodes

Language-specific node types include:
  • Struct (C, C++, Go, Rust)
  • Trait (Rust)
  • Impl (Rust implementation blocks)
  • Namespace (C++, C#)
  • Typedef/TypeAlias (C, C++, TypeScript)
  • Macro (C, C++, Rust)
  • Decorator/Annotation (Python, TypeScript, Java)
  • Constructor (Java, C++, Swift)

Analysis Nodes

These are created during the clustering and process detection phases:

Community

Groups of symbols that work together frequently, detected via the Leiden algorithm. Represents functional areas of the codebase.Properties:
  • heuristicLabel - Auto-generated name based on folder patterns
  • cohesion - Internal edge density score (0-1)
  • symbolCount - Number of symbols in this community

Process

Execution flows traced from entry points through call chains. Represents how features execute through the codebase.Properties:
  • processType - intra_community or cross_community
  • stepCount - Number of steps in the trace
  • communities - Community IDs touched by this process
  • entryPointId - Starting symbol
  • terminalId - Final symbol in the chain

Edge Types

Relationships between nodes are stored as CodeRelation edges with a type property:
Edge TypeDescriptionExample
CONTAINSFile/folder containmentFolderFile
DEFINESFile defines a symbolFileFunction
CALLSFunction calls another functionloginHandlervalidateUser
IMPORTSFile imports from another fileauth.tsutils.ts
EXTENDSClass inheritanceAdminUserBaseUser
IMPLEMENTSInterface implementationUserServiceIService
MEMBER_OFSymbol belongs to a communityvalidateUserAuthentication community
STEP_IN_PROCESSSymbol is a step in an execution flowvalidateUserLoginFlow process (step 2)
All edges include confidence and reason properties for traceability. See Confidence Scoring below.

Confidence Scoring

GitNexus assigns a confidence score (0.0-1.0) to every relationship to indicate resolution certainty. This is critical for impact analysis and process tracing.

Confidence Levels

Same-file references and exact import-resolved calls
// Same file - confidence: 1.0
function validateUser() { ... }
function login() {
  validateUser(); // ← caller and callee in same file
}
Reason: same-file or import-resolved
Import-resolved across files with direct import statements
// auth.ts
import { validateUser } from './validate';
validateUser(); // ← confidence: 0.85
Reason: import-resolved
Fuzzy name matching when imports can’t be resolvedUsed as a fallback when Tree-sitter can’t extract import details or the import path is ambiguous.Reason: fuzzy-global
Very uncertain fuzzy matchesCommon symbol names that appear in many files (e.g., render, init, process).Reason: fuzzy-global-low-confidence

Confidence Filtering

Process detection filters out edges with confidence < 0.5 to avoid false traces:
process-processor.ts:219
const MIN_TRACE_CONFIDENCE = 0.5;
Impact analysis tools accept a minConfidence parameter:
impact({target: "UserService", minConfidence: 0.8})

Example Cypher Queries

The MCP cypher tool and CLI allow direct graph queries. Here are common patterns:

Find All Callers of a Function

MATCH (caller)-[r:CodeRelation {type: 'CALLS'}]->(fn:Function {name: 'validateUser'})
WHERE r.confidence > 0.8
RETURN caller.name, caller.filePath, r.confidence
ORDER BY r.confidence DESC

Find Functions in Authentication Community

MATCH (c:Community {heuristicLabel: 'Authentication'})<-[:CodeRelation {type: 'MEMBER_OF'}]-(fn)
WHERE fn.label IN ['Function', 'Method']
RETURN fn.name, fn.filePath, fn.startLine

Trace a Call Chain (3 Hops)

MATCH path = (entry:Function {name: 'handleLogin'})
  -[:CodeRelation {type: 'CALLS'}*1..3]->(terminal)
WHERE ALL(r IN relationships(path) WHERE r.confidence > 0.5)
RETURN [node IN nodes(path) | node.name] AS trace
LIMIT 10

Find Cross-Community Processes

MATCH (p:Process {processType: 'cross_community'})
RETURN p.name, p.stepCount, p.communities
ORDER BY p.stepCount DESC

Find High-Confidence Imports

MATCH (source:File)-[r:CodeRelation {type: 'IMPORTS'}]->(target:File)
WHERE r.confidence = 1.0
RETURN source.filePath, target.filePath
LIMIT 20

Graph Statistics

You can query graph statistics to understand codebase size:
-- Count nodes by type
MATCH (n)
RETURN n.label AS nodeType, COUNT(n) AS count
ORDER BY count DESC

-- Count relationships by type
MATCH ()-[r:CodeRelation]->()
RETURN r.type AS relType, COUNT(r) AS count
ORDER BY count DESC

-- Find largest communities
MATCH (c:Community)
RETURN c.heuristicLabel, c.symbolCount, c.cohesion
ORDER BY c.symbolCount DESC
LIMIT 10

Storage and Performance

GitNexus uses KuzuDB for storage:
  • CLI: Native KuzuDB bindings (fast, persistent)
  • Web UI: KuzuDB WASM (in-memory, per-session)
The graph is stored in .gitnexus/ inside your repository (gitignored by default). A global registry at ~/.gitnexus/registry.json tracks all indexed repos.
Multi-repo support: The MCP server uses a connection pool to serve multiple indexed repos from a single server instance. Connections are opened lazily and evicted after 5 minutes of inactivity.

Next Steps

Indexing Pipeline

Learn how the graph is built in 6 phases

Hybrid Search

Understand BM25 + semantic search with RRF

Build docs developers (and LLMs) love