Overview
TheRAGEngine class is the core component of Quest that orchestrates the retrieval-augmented generation process. It integrates the LeetCodeRetriever for semantic search, conversation history management, and Ollama API for response generation.
Class Definition
Constructor Parameters
Instance of LeetCodeRetriever for semantic search of LeetCode solutions
URL endpoint for the Ollama API server
Default model to use for general mode responses
Model to use for reasoning mode responses with advanced problem-solving
Initial mode for the engine. Options:
"general" or "reasoning"Controls randomness in response generation. Lower values make output more deterministic
Nucleus sampling parameter for response diversity
Minimum confidence score for filtering retrieved solutions
Penalty for repeating tokens in generated responses
Number of threads for Ollama inference
Maximum number of conversation turns to retain in history
Methods
answer_question
Answer a user query using the enhanced RAG pipeline with retrieval, filtering, and generation.The user’s question or problem description
Number of solutions to retrieve from the vector database
Minimum confidence score threshold for filtering retrieved solutions. Solutions below this threshold are excluded.
Generated answer with solution details. Returns either:
"Exact Match Solution:\n{solution}"if an exact title match is found"Generated Solution:\n{response}"if generated from retrieved context- Error message if generation fails
- First checks for exact title match using normalized query
- If no exact match, retrieves top-k similar solutions
- Filters solutions by confidence threshold
- If insufficient results, recursively retries with increased k and lowered threshold
- Adds query and response to conversation history
- In reasoning mode, filters out
<think>blocks from response
set_mode
Switch between general and reasoning modes. Reasoning mode uses a specialized model for deeper analysis.Mode to set. Must be either
"general" or "reasoning"ValueErrorif mode is not “general” or “reasoning”
stop
Stop the ongoing generation process immediately. Returns the partial response generated so far.- Sets internal
stop_generationflag toTrue - Current streaming response terminates gracefully
- Partial response is returned from
answer_question()
reset
Reset the stop flag to allow new generation requests. Automatically called at the start ofanswer_question().
- Sets
stop_generationflag toFalse - Enables new generation requests after a previous stop
generate_enhanced_prompt
Generate a structured prompt incorporating retrieved context and conversation history.User’s question or problem description
List of Solution objects retrieved from the vector database
Formatted prompt string combining conversation history, query, context, and mode-specific instructions
- Retrieves conversation history via
conversation_history.get_context() - Selects prompt template based on current mode (general or reasoning)
- Combines history, query, context, and instructions into enhanced prompt
call_ollama
Send a prompt to the Ollama API with streaming, error handling, and retry logic.Complete prompt to send to the Ollama model
Generated text response from the model. Returns error messages if API calls fail after retries.
- Retries up to 3 times on failure
- Selects model based on current mode (reasoning_model or model_name)
- Streams response in real-time
- Respects
stop_generationflag for early termination - Returns partial response if stopped mid-generation
- Logs warnings for non-200 status codes
- Retries on exceptions
- Returns
"Error generating response after multiple attempts."after 3 failed attempts
filter_reasoning_response
Filter out the<think> reasoning block from DeepSeek model responses.
Raw response from the reasoning model
Response with
<think> blocks removed. Returns original response if no <think> block found.- Detects
<think>...</think>blocks in response - Extracts and returns content after
</think>tag - Preserves original response if no thinking block exists
Internal Methods
_build_exact_match_map
Builds a hash map for O(1) exact title lookup. Returns: Dictionary mapping normalized titles to Solution objects_normalize_title
Normalizes a title for case-insensitive exact matching.Attributes
Instance of the retriever for semantic search
Manages conversation context with max_history limit
Hash map for exact title matching (normalized title → Solution)
Flag to control generation termination
Usage Example
Mode Comparison
General Mode
- Uses
model_name(default: qwen2.5-coder:1.5b) - Faster responses
- Concise answers with code
- Best for straightforward queries
Reasoning Mode
- Uses
reasoning_model(default: deepseek-r1:7b) - Detailed step-by-step analysis
- Filters
<think>blocks - Best for complex problem-solving
Error Handling
The engine implements comprehensive error handling:- API Failures: Retries up to 3 times with exponential backoff
- Low Confidence: Automatically adjusts k and min_confidence
- No Results: Returns helpful error messages
- Generation Errors: Logs details and returns user-friendly messages
See Also
- LeetCodeRetriever - Semantic search component
- ConversationHistory - Conversation management
- PromptTemplates - Prompt formatting