Skip to main content

RepoRAGX: Chat with GitHub repositories

RepoRAGX is a powerful CLI tool that lets you chat with any GitHub repository using Retrieval-Augmented Generation (RAG). Load a codebase, ask questions in natural language, and get context-aware answers powered by AI.

Quick start

Get up and running in 5 minutes

Installation

Set up your environment and dependencies

How it works

Learn how RepoRAGX works under the hood

API reference

Explore the Python API

How it works

RepoRAGX uses a two-pipeline architecture to enable intelligent code search and question answering:

Pipeline 1: Data ingestion

GitHub Repo → Load Files → Chunk Code → Generate Embeddings → Store in ChromaDB
1

Load

Fetches all files from a GitHub repository using LangChain’s GithubFileLoader, automatically filtering out binaries, images, and folders like node_modules/, .git/, and venv/
2

Chunk

Splits code using LangChain’s RecursiveCharacterTextSplitter with language-aware splitting. Supports 25+ programming languages including Python, JavaScript, TypeScript, Java, Go, Rust, and more
3

Embed

Generates vector embeddings using Sentence Transformers’ all-MiniLM-L6-v2 model, converting code into 384-dimensional vectors
4

Store

Persists embeddings in a local ChromaDB vector database with cosine similarity indexing for fast retrieval

Pipeline 2: RAG retrieval

User Query → Generate Query Embedding → Cosine Similarity Search → Retrieve Top-K Chunks → LLM Generates Answer
1

Embed query

Converts your question into a vector using the same Sentence Transformers model
2

Retrieve

ChromaDB performs cosine similarity search and returns the top-K most relevant code chunks with file path metadata
3

Answer

Sends retrieved code context (with file paths) + your question to Groq’s llama-3.3-70b-versatile model, which generates a context-aware answer

Key features

Language-aware chunking

Intelligently splits code based on language syntax for 25+ programming languages

Local vector storage

All embeddings stored locally in ChromaDB at ~/.RepoRAGX/vector_store

Fast similarity search

Cosine similarity indexing enables sub-second retrieval times

Context-aware answers

Powered by Groq’s Llama 3.3 70B model for accurate, detailed responses

Use cases

  • Onboarding: Quickly understand a new codebase without reading every file
  • Code archaeology: Find where specific features are implemented
  • Documentation: Ask questions about APIs, architecture, and design patterns
  • Debugging: Locate error handling, authentication logic, or configuration files
  • Learning: Explore open-source projects by asking natural language questions

What you’ll need

GitHub token

Personal access token with content:read permission

Groq API key

Free API key from console.groq.com
RepoRAGX runs entirely locally except for the GitHub API (to fetch repository files) and Groq API (for LLM inference). All embeddings and vector storage are stored on your machine.

Example session

$ python -m src.main

/**
 *    __________                    __________    _____    ____________  ___
 *    \______   \ ____ ______   ____\______   \  /  _  \  /  _____/\   \/  /
 *     |       _// __ \\____ \ /  _ \|       _/ /  /_\  \/   \  ___ \     / 
 *     |    |   \  ___/|  |_> >  <_> )    |   \/    |    \    \_\  \/     \ 
 *     |____|_  /\___  >   __/ \____/|____|_  /\____|__  /\______  /___/\  \
 *            \/     \/|__|                 \/         \/        \/      \_/
 */

Chat with your github repository

GitHub Personal Access Token: ********
Groq API Key: ********
Model Name (default: llama-3.3-70b-versatile): 
Repo (owner/repo): facebook/react
Branch (default: main): main

Initilizing github loader.....
Fetching files from github....
Loaded 847 files from github!
Splitting documents into chunks...
chunking completed
Generating embeddings for 2341 texts...
Adding 2341 documents to vector store...
Successfully added 2341 documents to vector store

Ask anything ('exit' to quit): Where is the useState hook implemented?
> The useState hook is implemented in packages/react/src/ReactHooks.js...

Ask anything ('exit' to quit): exit

Next steps

Install RepoRAGX

Set up Python 3.12, create a virtual environment, and install dependencies

Quick start guide

Run your first query in under 5 minutes

Build docs developers (and LLMs) love