Knowledge Graph Structure

GitNexus builds a complete knowledge graph of your codebase using KuzuDB, an embedded graph database. Every symbol, relationship, cluster, and execution flow is stored as nodes and edges with confidence scoring.

Graph Schema

The knowledge graph consists of nodes representing code entities and edges representing relationships between them.

Node Types

GitNexus creates several types of nodes during indexing:

Code Structure Nodes

File

Represents a source file in the repository

Folder

Directory in the file tree

Module/Package

Language-specific module or package

Symbol Nodes

Function

Top-level function definitions

Class

Class definitions (OOP languages)

Method

Methods within classes

Interface

TypeScript/Java interfaces, Go interfaces, etc.

Enum

Enumeration types

Variable

Module-level variables and constants

Multi-Language Nodes

Language-specific node types include:

Struct (C, C++, Go, Rust)
Trait (Rust)
Impl (Rust implementation blocks)
Namespace (C++, C#)
Typedef/TypeAlias (C, C++, TypeScript)
Macro (C, C++, Rust)
Decorator/Annotation (Python, TypeScript, Java)
Constructor (Java, C++, Swift)

Analysis Nodes

These are created during the clustering and process detection phases:

Community

Groups of symbols that work together frequently, detected via the Leiden algorithm. Represents functional areas of the codebase.Properties:

heuristicLabel - Auto-generated name based on folder patterns
cohesion - Internal edge density score (0-1)
symbolCount - Number of symbols in this community

Process

Execution flows traced from entry points through call chains. Represents how features execute through the codebase.Properties:

processType - intra_community or cross_community
stepCount - Number of steps in the trace
communities - Community IDs touched by this process
entryPointId - Starting symbol
terminalId - Final symbol in the chain

Edge Types

Relationships between nodes are stored as CodeRelation edges with a type property:

Edge Type	Description	Example
`CONTAINS`	File/folder containment	`Folder` → `File`
`DEFINES`	File defines a symbol	`File` → `Function`
`CALLS`	Function calls another function	`loginHandler` → `validateUser`
`IMPORTS`	File imports from another file	`auth.ts` → `utils.ts`
`EXTENDS`	Class inheritance	`AdminUser` → `BaseUser`
`IMPLEMENTS`	Interface implementation	`UserService` → `IService`
`MEMBER_OF`	Symbol belongs to a community	`validateUser` → `Authentication` community
`STEP_IN_PROCESS`	Symbol is a step in an execution flow	`validateUser` → `LoginFlow` process (step 2)

All edges include confidence and reason properties for traceability. See Confidence Scoring below.

Confidence Scoring

GitNexus assigns a confidence score (0.0-1.0) to every relationship to indicate resolution certainty. This is critical for impact analysis and process tracing.

Confidence Levels

1.0 - Certain

Same-file references and exact import-resolved calls

// Same file - confidence: 1.0
function validateUser() { ... }
function login() {
  validateUser(); // ← caller and callee in same file
}

Reason: same-file or import-resolved

0.85 - High

Import-resolved across files with direct import statements

// auth.ts
import { validateUser } from './validate';
validateUser(); // ← confidence: 0.85

Reason: import-resolved

0.5 - Medium

Fuzzy name matching when imports can’t be resolvedUsed as a fallback when Tree-sitter can’t extract import details or the import path is ambiguous.Reason: fuzzy-global

0.3 - Low

Very uncertain fuzzy matchesCommon symbol names that appear in many files (e.g., render, init, process).Reason: fuzzy-global-low-confidence

Confidence Filtering

Process detection filters out edges with confidence < 0.5 to avoid false traces:

process-processor.ts:219

const MIN_TRACE_CONFIDENCE = 0.5;

Impact analysis tools accept a minConfidence parameter:

impact({target: "UserService", minConfidence: 0.8})

Example Cypher Queries

The MCP cypher tool and CLI allow direct graph queries. Here are common patterns:

Find All Callers of a Function

MATCH (caller)-[r:CodeRelation {type: 'CALLS'}]->(fn:Function {name: 'validateUser'})
WHERE r.confidence > 0.8
RETURN caller.name, caller.filePath, r.confidence
ORDER BY r.confidence DESC

Find Functions in Authentication Community

MATCH (c:Community {heuristicLabel: 'Authentication'})<-[:CodeRelation {type: 'MEMBER_OF'}]-(fn)
WHERE fn.label IN ['Function', 'Method']
RETURN fn.name, fn.filePath, fn.startLine

Trace a Call Chain (3 Hops)

MATCH path = (entry:Function {name: 'handleLogin'})
  -[:CodeRelation {type: 'CALLS'}*1..3]->(terminal)
WHERE ALL(r IN relationships(path) WHERE r.confidence > 0.5)
RETURN [node IN nodes(path) | node.name] AS trace
LIMIT 10

Find Cross-Community Processes

MATCH (p:Process {processType: 'cross_community'})
RETURN p.name, p.stepCount, p.communities
ORDER BY p.stepCount DESC

Find High-Confidence Imports

MATCH (source:File)-[r:CodeRelation {type: 'IMPORTS'}]->(target:File)
WHERE r.confidence = 1.0
RETURN source.filePath, target.filePath
LIMIT 20

Graph Statistics

You can query graph statistics to understand codebase size:

-- Count nodes by type
MATCH (n)
RETURN n.label AS nodeType, COUNT(n) AS count
ORDER BY count DESC

-- Count relationships by type
MATCH ()-[r:CodeRelation]->()
RETURN r.type AS relType, COUNT(r) AS count
ORDER BY count DESC

-- Find largest communities
MATCH (c:Community)
RETURN c.heuristicLabel, c.symbolCount, c.cohesion
ORDER BY c.symbolCount DESC
LIMIT 10

Storage and Performance

GitNexus uses KuzuDB for storage:

CLI: Native KuzuDB bindings (fast, persistent)
Web UI: KuzuDB WASM (in-memory, per-session)

The graph is stored in .gitnexus/ inside your repository (gitignored by default). A global registry at ~/.gitnexus/registry.json tracks all indexed repos.

Multi-repo support: The MCP server uses a connection pool to serve multiple indexed repos from a single server instance. Connections are opened lazily and evicted after 5 minutes of inactivity.

Get Started

Core Concepts

CLI Usage

MCP Integration

Agent Skills

Web UI

Advanced

​Graph Schema

​Node Types

​Code Structure Nodes

File

Folder

Module/Package

​Symbol Nodes

Function

Class

Method

Interface

Enum

Variable

​Multi-Language Nodes

​Analysis Nodes

Community

Process

​Edge Types

​Confidence Scoring

​Confidence Levels

​Confidence Filtering

​Example Cypher Queries

​Find All Callers of a Function

​Find Functions in Authentication Community

​Trace a Call Chain (3 Hops)

​Find Cross-Community Processes

​Find High-Confidence Imports

​Graph Statistics

​Storage and Performance

​Next Steps

Indexing Pipeline

Hybrid Search

Build docs developers (and LLMs) love

Graph Schema

Node Types

Code Structure Nodes

Symbol Nodes

Multi-Language Nodes

Analysis Nodes

Edge Types

Confidence Scoring

Confidence Levels

Confidence Filtering

Example Cypher Queries

Find All Callers of a Function

Find Functions in Authentication Community

Trace a Call Chain (3 Hops)

Find Cross-Community Processes

Find High-Confidence Imports

Graph Statistics

Storage and Performance

Next Steps