RepoRAGX: Chat with GitHub repositories
RepoRAGX is a powerful CLI tool that lets you chat with any GitHub repository using Retrieval-Augmented Generation (RAG). Load a codebase, ask questions in natural language, and get context-aware answers powered by AI.Quick start
Get up and running in 5 minutes
Installation
Set up your environment and dependencies
How it works
Learn how RepoRAGX works under the hood
API reference
Explore the Python API
How it works
RepoRAGX uses a two-pipeline architecture to enable intelligent code search and question answering:Pipeline 1: Data ingestion
Load
Fetches all files from a GitHub repository using LangChain’s
GithubFileLoader, automatically filtering out binaries, images, and folders like node_modules/, .git/, and venv/Chunk
Splits code using LangChain’s
RecursiveCharacterTextSplitter with language-aware splitting. Supports 25+ programming languages including Python, JavaScript, TypeScript, Java, Go, Rust, and moreEmbed
Generates vector embeddings using Sentence Transformers’
all-MiniLM-L6-v2 model, converting code into 384-dimensional vectorsPipeline 2: RAG retrieval
Retrieve
ChromaDB performs cosine similarity search and returns the top-K most relevant code chunks with file path metadata
Key features
Language-aware chunking
Intelligently splits code based on language syntax for 25+ programming languages
Local vector storage
All embeddings stored locally in ChromaDB at
~/.RepoRAGX/vector_storeFast similarity search
Cosine similarity indexing enables sub-second retrieval times
Context-aware answers
Powered by Groq’s Llama 3.3 70B model for accurate, detailed responses
Use cases
- Onboarding: Quickly understand a new codebase without reading every file
- Code archaeology: Find where specific features are implemented
- Documentation: Ask questions about APIs, architecture, and design patterns
- Debugging: Locate error handling, authentication logic, or configuration files
- Learning: Explore open-source projects by asking natural language questions
What you’ll need
GitHub token
Personal access token with
content:read permissionGroq API key
Free API key from console.groq.com
RepoRAGX runs entirely locally except for the GitHub API (to fetch repository files) and Groq API (for LLM inference). All embeddings and vector storage are stored on your machine.
Example session
Next steps
Install RepoRAGX
Set up Python 3.12, create a virtual environment, and install dependencies
Quick start guide
Run your first query in under 5 minutes