Starting the application
RepoRAGX runs as an interactive CLI tool. To start it, run:Welcome banner
When you start RepoRAGX, you’ll see the ASCII art banner:Interactive setup flow
After the banner, RepoRAGX will prompt you for the required information to connect to your repository and start the RAG pipeline.Enter GitHub token
The first prompt asks for your GitHub Personal Access Token. This is hidden for security:The token needs
content:read permission to access repository files. Get yours at github.com/settings/tokens.Enter Groq API key
Next, you’ll be prompted for your Groq API key (also hidden):Get your API key from console.groq.com/keys.
Select LLM model
Choose which Groq model to use for answering questions:Press Enter to use the default
llama-3.3-70b-versatile model, or specify a different model supported by Groq.Specify repository
Enter the repository you want to chat with:Use the format
owner/repo (e.g., facebook/react, vercel/next.js).Loading and indexing process
Once you’ve provided all inputs, RepoRAGX starts the data ingestion pipeline:The indexing process downloads all repository files (excluding binaries and common folders like
node_modules/), chunks them using language-aware splitting, generates embeddings, and stores them in ChromaDB.What gets loaded
RepoRAGX filters out files that aren’t useful for code understanding:- Excluded file types: Images (
.png,.jpg,.svg), archives (.zip,.tar), binaries (.exe,.dll), media files (.mp3,.mp4), and more - Excluded folders:
node_modules/,.git/,dist/,build/,__pycache__/,venv/,.venv/
Storage location
Vector embeddings are persisted locally in:Ready to chat
Once indexing completes, you’ll see the main query prompt:Exiting the application
To exit RepoRAGX at any time, typeexit at the query prompt: