Default model
RepoRAGX uses llama-3.3-70b-versatile as the default model (src/main.py:29). This model provides an excellent balance of:- Fast inference speed through Groq’s optimized infrastructure
- Strong code understanding and reasoning capabilities
- Support for large context windows
- High-quality answer generation
Specifying a custom model
You can specify a different model when starting RepoRAGX:Interactive prompt
When running the application, you’ll be prompted for a model name:- Press Enter to use the default model
- Type a model name to use a different model (e.g.,
mixtral-8x7b-32768)
The model selection happens at startup (src/main.py:29) and cannot be changed during a chat session.
Available Groq models
Groq supports multiple open-source LLMs. Popular options include:Llama models
| Model Name | Context Length | Best For |
|---|---|---|
llama-3.3-70b-versatile | 128K tokens | Default - General purpose, excellent code understanding |
llama-3.1-70b-versatile | 128K tokens | Previous generation, still highly capable |
llama-3.1-8b-instant | 128K tokens | Faster responses, lower resource usage |
Mixtral models
| Model Name | Context Length | Best For |
|---|---|---|
mixtral-8x7b-32768 | 32K tokens | Fast inference, good for shorter contexts |
Other models
| Model Name | Context Length | Best For |
|---|---|---|
gemma2-9b-it | 8K tokens | Efficient, good for simple queries |
Model configuration
The Groq LLM is initialized with specific parameters (src/rag/groq_llm.py:9-26):Configuration parameters
model_name (required)
The identifier of the Groq model to use. This is the only parameter you can customize via the CLI prompt.temperature (default: 0.1)
Controls response randomness:- 0.1 (current default) - More deterministic, consistent answers
- 0.5 - Balanced creativity and consistency
- 1.0 - More creative but less predictable responses
Lower temperatures are recommended for code-related queries where accuracy is critical.
max_tokens (default: 1024)
Maximum number of tokens in the generated response:- 1024 (current default) - Suitable for most code questions
- 2048 - Longer, more detailed explanations
- 4096 - Very comprehensive answers
How the model is used
RepoRAGX uses the selected model in a RAG pipeline:Context retrieval
ChromaDB performs cosine similarity search to find the top-K most relevant code chunks (default: 5)
Prompt construction
Retrieved code chunks are formatted with file paths and combined with your question:
Performance considerations
Model size vs. speed
- Larger models (70B parameters): Higher quality answers, slightly slower
- Smaller models (8B parameters): Faster responses, may be less accurate for complex code
Context window size
Different models support different context lengths:- 128K tokens: Can handle very large codebases and multiple file contexts
- 32K tokens: Sufficient for most use cases
- 8K tokens: May truncate context for large retrievals
Changing models mid-session
To use a different model:- Exit the current session by typing
exit - Restart the application with
python -m src.main - Enter your desired model name when prompted
The vector store persists between sessions, so you won’t need to re-index the repository.