Quick start
This guide will walk you through setting up API keys, loading your first repository, and asking questions.Before you begin: Make sure you’ve completed the installation steps.
Step 1: Get your API keys
RepoRAGX requires two API keys to function:GitHub personal access token
- Go to github.com/settings/tokens
- Click “Generate new token” → “Generate new token (classic)”
- Give it a descriptive name like “RepoRAGX”
- Select the
reposcope (or justpublic_repoif you only need public repositories) - Click “Generate token” and copy the token immediately
Groq API key
- Go to console.groq.com
- Sign up for a free account (no credit card required)
- Navigate to console.groq.com/keys
- Click “Create API Key”
- Give it a name and copy the API key
Step 2: Run RepoRAGX
With your virtual environment activated, start RepoRAGX:Step 3: Configure your session
Enter your credentials and repository details when prompted:Model selection
Model selection
You can use any model supported by Groq. Popular options:
llama-3.3-70b-versatile(default, best quality)llama-3.1-70b-versatile(alternative)mixtral-8x7b-32768(faster, good for smaller queries)
Repository format
Repository format
Use the format
owner/repo:- ✅
facebook/react - ✅
microsoft/vscode - ✅
AnmolTutejaGitHub/RepoRAGX - ❌
https://github.com/facebook/react(don’t include the full URL)
Branch selection
Branch selection
The default branch is
main. You can specify any branch name:mainmasterdevelopfeature/new-auth
Step 4: Loading the repository
RepoRAGX will now fetch and process the repository:First run: The embedding model (
all-MiniLM-L6-v2) will be downloaded automatically (~90MB). This only happens once.What’s happening behind the scenes
File loading
GitHubCodeBaseLoader fetches all files from the repository, filtering out:- Binary files (
.png,.jpg,.exe, etc.) - Dependencies (
node_modules/,venv/,.git/) - Generated files (
.pyc,.class,.min.js)
src/rag/github_codebase_loader.py:3-24Text chunking
TextSplitter uses language-aware splitting with RecursiveCharacterTextSplitter:- Chunk size: 1000 characters
- Chunk overlap: 200 characters
- Supports 25+ languages (Python, JavaScript, Java, Go, Rust, etc.)
src/rag/text_splitter.py:3-50Embedding generation
EmbeddingManager generates 384-dimensional vectors using Sentence Transformers:Step 5: Ask questions
Once loading is complete, you can start asking questions:- Convert your query to a vector embedding
- Search ChromaDB for the top 5 most similar code chunks
- Send the retrieved context to Groq’s LLM
- Return a context-aware answer
Example queries
Sample output
Understanding retrieval
When you ask a question, RepoRAGX performs similarity search:Advanced configuration
Adjusting retrieval parameters
You can modify retrieval settings insrc/rag/rag_retriever.py:7:
top_k: Number of chunks to retrieve (default: 5)score_threshold: Minimum similarity score (0.0 = no filtering)
Changing chunk size
Modify text splitting insrc/rag/text_splitter.py:53:
chunk_size: Maximum characters per chunkchunk_overlap: Overlapping characters between chunks
Using different models
Groq supports multiple models. Try these alternatives:src/rag/groq_llm.py:9-26 for LLM configuration.
Persistent storage
RepoRAGX stores vector embeddings locally:Troubleshooting
Authentication errors
Authentication errors
Rate limit errors
Rate limit errors
No documents found
No documents found
- Rephrasing your question to be more specific
- Using keywords that appear in the codebase
- Lowering
score_thresholdinsrc/rag/rag_retriever.py:7
Model download fails
Model download fails
Exit the session
To exit RepoRAGX, typeexit at the prompt:
Next steps
How it works
Deep dive into RepoRAGX’s internals
API reference
Explore the Python API
Configuration
Customize API keys and models
Examples
See real-world usage examples