Skip to main content

Supported Languages

GitNexus currently supports 11 programming languages with full Tree-sitter AST parsing:
  • TypeScript (including TSX/React)
  • JavaScript
  • Python
  • Java
  • C
  • C++
  • C#
  • Go
  • Rust
  • PHP
  • Swift (optional dependency)

Tree-sitter Parsing

What is Tree-sitter?

GitNexus uses Tree-sitter to parse source code into Abstract Syntax Trees (ASTs). Tree-sitter provides:
  • Fast, incremental parsing - Parses entire codebases in seconds
  • Error-tolerant - Can parse incomplete or syntactically incorrect code
  • Language-agnostic - Consistent parsing interface across all languages
  • Native performance - Written in C with Node.js bindings

How GitNexus Uses Tree-sitter

The indexing pipeline extracts code structure through Tree-sitter:
  1. Symbol Extraction - Identifies functions, classes, methods, interfaces, and other language constructs
  2. Location Tracking - Records start/end line numbers for each symbol
  3. Call Graph Building - Detects function calls and method invocations
  4. Import Resolution - Maps import statements to their targets
  5. Inheritance Analysis - Extracts class hierarchies and interface implementations

CLI vs Web: Native vs WASM

EnvironmentParser ImplementationPerformance
CLI (Node.js)Native Tree-sitter bindings (tree-sitter npm package)⚡ Full native speed
Web (Browser)Tree-sitter WASM modules (tree-sitter-wasm)🐢 ~2-3x slower than native
The CLI uses native C bindings for maximum parsing speed, while the web UI uses WebAssembly builds that run entirely in the browser.

Language-Aware Resolution

GitNexus uses language-specific resolution logic to accurately map symbols:

Import Resolution

Each language has custom import resolution:
  • JavaScript/TypeScript - Handles ES modules, CommonJS, path aliases, and package.json resolution
  • Python - Resolves relative imports, absolute imports, and package imports
  • Java - Understands package structure and fully-qualified names
  • Go - Resolves module paths and package imports
  • C/C++ - Handles #include directives with header search paths

Call Resolution

Function call resolution accounts for language semantics:
  • Method calls - Resolves through class hierarchies and interfaces
  • Namespaced calls - Handles qualified names (e.g., package.Class.method())
  • Dynamic calls - Best-effort resolution for runtime dispatch
  • Generic functions - Tracks template/generic instantiations

Symbol Scope

GitNexus tracks symbol visibility:
  • Exported symbols - Functions/classes marked as public or exported
  • Private symbols - Internal implementation details
  • Scoped symbols - Nested functions, closures, and local definitions

Language Support Roadmap

Coming Soon

Potential future language support:
  • Ruby - Community requested
  • Kotlin - JVM interop
  • Scala - Functional JVM language
  • Dart/Flutter - Mobile development
  • Elixir - Functional programming

How Languages Are Added

Adding a new language requires:
  1. Tree-sitter grammar - Must have a stable Tree-sitter parser
  2. Query patterns - AST queries to extract symbols (functions, classes, etc.)
  3. Import resolution - Language-specific import/module logic
  4. Call resolution - Function call and method invocation patterns
  5. Testing - Validation against real-world codebases

Parser Configuration

File Extensions

GitNexus automatically detects languages by file extension:
  • .ts, .tsx → TypeScript
  • .js, .jsx → JavaScript
  • .py → Python
  • .java → Java
  • .c, .h → C
  • .cpp, .cc, .cxx, .hpp → C++
  • .cs → C#
  • .go → Go
  • .rs → Rust
  • .php → PHP
  • .swift → Swift

Excluded Files

GitNexus automatically skips:
  • node_modules/
  • .git/
  • dist/, build/, out/
  • .next/, .nuxt/
  • vendor/ (PHP, Go)
  • target/ (Rust)
  • Binary and media files

Performance Characteristics

Parsing Speed

Codebase SizeFilesParse Time (CLI)
Small~100 files1-3 seconds
Medium~1,000 files10-30 seconds
Large~10,000 files2-5 minutes
Huge (Linux kernel)~50,000+ files10-15 minutes

Memory Usage

GitNexus uses a chunked parsing strategy to keep memory bounded:
  • Chunk size: 20MB of source code per batch
  • Concurrent workers: CPU count - 1 (max 8 workers)
  • Peak memory: ~200-400MB per chunk during parsing
  • AST cache: Limited to 50 trees (sequential) or chunk size (parallel)
The CLI automatically allocates an 8GB heap for large repositories.

Troubleshooting

”Language not supported”

If you see this error:
  1. Check that your file extension is recognized
  2. Ensure the file is parseable (not minified or obfuscated)
  3. Verify it’s not in an excluded directory

Parsing Errors

Tree-sitter is error-tolerant, but some files may fail:
  • Minified code - Not supported (exclude *.min.js)
  • Generated code - May produce noisy results
  • Syntax errors - Partial parsing may succeed

Swift Not Available

tree-sitter-swift is an optional dependency. If installation fails:
npm install --no-optional gitnexus
```text

GitNexus will work for all other languages.

Build docs developers (and LLMs) love