Overview
TheRAGEngine class is the core component that orchestrates retrieval-augmented generation. It manages the LLM interface, conversation history, and prompt generation.
Initialization Parameters
The RAGEngine accepts the following parameters in its__init__ method:
The retriever instance that handles semantic search over the LeetCode solution database.
The URL endpoint for the Ollama API server. Change this if Ollama is running on a different host or port.
The default model used for general mode. This model handles standard queries and code generation.
Qwen2.5-coder is optimized for code-related tasks and provides fast responses.
The model used when the engine is in reasoning mode. This model provides step-by-step thinking for complex problems.
The initial generation mode. Valid values are:
"general": Uses the default model for standard responses"reasoning": Uses the reasoning model for step-by-step problem solving
rag_engine.set_mode("reasoning").Controls randomness in generation. Lower values (0.0-0.5) produce more focused and deterministic outputs, while higher values (0.5-1.0) increase creativity and variation.Recommended ranges:
- Code generation: 0.2-0.4
- Explanations: 0.4-0.6
- Creative variations: 0.6-0.8
Nucleus sampling parameter. The model considers only the most probable tokens whose cumulative probability is at least
top_p.Minimum confidence score for retrieved solutions. Solutions with scores below this threshold are filtered out.The engine automatically adjusts retrieval if no solutions meet the threshold.
Penalizes repeated tokens to reduce redundancy. Values > 1.0 discourage repetition.
1.0: No penalty1.1: Light penalty (default, recommended)1.2-1.5: Stronger penalty (use if responses are too repetitive)
Number of CPU threads used for model inference. Adjust based on your hardware.
Maximum number of conversation turns to keep in memory. The engine maintains the last
max_history query-response pairs for context-aware generation.Increasing this value provides more context but increases prompt length and latency.Example Configurations
Default Configuration
High-Performance Configuration
Optimized for speed on powerful hardware:Reasoning-Focused Configuration
Optimized for complex problem-solving:Low-Resource Configuration
For systems with limited CPU/memory:Runtime Methods
Changing Mode
Stopping Generation
Clearing History
Key Features
Exact Match Optimization
The engine builds a hash map for O(1) exact title matching, bypassing semantic search when the query matches a known problem title.
Adaptive Retrieval
If no solutions meet the confidence threshold, the engine automatically expands the search with relaxed parameters.
Conversation Memory
Maintains conversation history to provide context-aware responses across multiple turns.
Reasoning Filtering
Automatically filters out internal
<think> blocks from DeepSeek-R1 responses, showing only the final answer.