RAGEngine Configuration

Overview

The RAGEngine class is the core component that orchestrates retrieval-augmented generation. It manages the LLM interface, conversation history, and prompt generation.

Initialization Parameters

The RAGEngine accepts the following parameters in its __init__ method:

retriever

LeetCodeRetriever

required

The retriever instance that handles semantic search over the LeetCode solution database.

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever
retriever = LeetCodeRetriever()

ollama_url

string

default:"http://localhost:11434/api/generate"

The URL endpoint for the Ollama API server. Change this if Ollama is running on a different host or port.

rag_engine = RAGEngine(
    retriever=retriever,
    ollama_url="http://192.168.1.100:11434/api/generate"
)

model_name

string

default:"qwen2.5-coder:1.5b"

The default model used for general mode. This model handles standard queries and code generation.

Qwen2.5-coder is optimized for code-related tasks and provides fast responses.

reasoning_model

string

default:"deepseek-r1:7b"

The model used when the engine is in reasoning mode. This model provides step-by-step thinking for complex problems.

DeepSeek-R1 includes explicit reasoning steps in <think> tags, which are automatically filtered from the final response.

mode

string

default:"general"

The initial generation mode. Valid values are:

"general": Uses the default model for standard responses
"reasoning": Uses the reasoning model for step-by-step problem solving

You can change the mode at runtime using rag_engine.set_mode("reasoning").

temperature

float

default:"0.4"

Controls randomness in generation. Lower values (0.0-0.5) produce more focused and deterministic outputs, while higher values (0.5-1.0) increase creativity and variation.Recommended ranges:

Code generation: 0.2-0.4
Explanations: 0.4-0.6
Creative variations: 0.6-0.8

top_p

float

default:"0.9"

Nucleus sampling parameter. The model considers only the most probable tokens whose cumulative probability is at least top_p.

Setting this too low (< 0.5) can make responses repetitive. The default of 0.9 provides good diversity.

confidence_threshold

float

default:"0.7"

Minimum confidence score for retrieved solutions. Solutions with scores below this threshold are filtered out.The engine automatically adjusts retrieval if no solutions meet the threshold.

repeat_penalty

float

default:"1.1"

Penalizes repeated tokens to reduce redundancy. Values > 1.0 discourage repetition.

1.0: No penalty
1.1: Light penalty (default, recommended)
1.2-1.5: Stronger penalty (use if responses are too repetitive)

num_thread

int

default:"8"

Number of CPU threads used for model inference. Adjust based on your hardware.

On systems with fewer cores, reduce this to 4. On high-performance systems, increase to 12-16 for faster generation.

max_history

int

default:"3"

Maximum number of conversation turns to keep in memory. The engine maintains the last max_history query-response pairs for context-aware generation.Increasing this value provides more context but increases prompt length and latency.

Example Configurations

Default Configuration

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever
from rag_engine import RAGEngine

retriever = LeetCodeRetriever()
rag_engine = RAGEngine(retriever)

High-Performance Configuration

Optimized for speed on powerful hardware:

rag_engine = RAGEngine(
    retriever=retriever,
    model_name="qwen2.5-coder:1.5b",
    temperature=0.3,
    top_p=0.85,
    num_thread=16,
    max_history=2
)

Reasoning-Focused Configuration

Optimized for complex problem-solving:

rag_engine = RAGEngine(
    retriever=retriever,
    mode="reasoning",
    reasoning_model="deepseek-r1:7b",
    temperature=0.5,
    top_p=0.9,
    max_history=5,
    confidence_threshold=0.6
)

Low-Resource Configuration

For systems with limited CPU/memory:

rag_engine = RAGEngine(
    retriever=retriever,
    model_name="qwen2.5-coder:1.5b",
    temperature=0.4,
    num_thread=4,
    max_history=2
)

Runtime Methods

Changing Mode

rag_engine.set_mode("reasoning")  # Switch to reasoning mode
rag_engine.set_mode("general")     # Switch back to general mode

Stopping Generation

rag_engine.stop()  # Stop ongoing generation

Clearing History

rag_engine.conversation_history.clear()  # Clear all conversation history

Key Features

Exact Match Optimization

The engine builds a hash map for O(1) exact title matching, bypassing semantic search when the query matches a known problem title.

Adaptive Retrieval

If no solutions meet the confidence threshold, the engine automatically expands the search with relaxed parameters.

Conversation Memory

Maintains conversation history to provide context-aware responses across multiple turns.

Reasoning Filtering

Automatically filters out internal <think> blocks from DeepSeek-R1 responses, showing only the final answer.

Configuration Tips

For Code Generation: Use temperature=0.3, mode="general", and model_name="qwen2.5-coder:1.5b"

For Problem Explanations: Use temperature=0.5, mode="reasoning", and max_history=5

Avoid setting temperature above 0.8 for code generation as it can produce syntactically incorrect code.

Get Started

Core Concepts

Guides

Configuration

RAGEngine Configuration

Overview

Initialization Parameters

Example Configurations

Default Configuration

High-Performance Configuration

Reasoning-Focused Configuration

Low-Resource Configuration

Runtime Methods

Changing Mode

Stopping Generation

Clearing History

Key Features

Exact Match Optimization

Adaptive Retrieval

Conversation Memory

Reasoning Filtering

Configuration Tips

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

​Overview

​Initialization Parameters

​Example Configurations

​Default Configuration

​High-Performance Configuration

​Reasoning-Focused Configuration

​Low-Resource Configuration

​Runtime Methods

​Changing Mode

​Stopping Generation

​Clearing History

​Key Features

Exact Match Optimization

Adaptive Retrieval

Conversation Memory

Reasoning Filtering

​Configuration Tips

Build docs developers (and LLMs) love

Overview

Initialization Parameters

Example Configurations

Default Configuration

High-Performance Configuration

Reasoning-Focused Configuration

Low-Resource Configuration

Runtime Methods

Changing Mode

Stopping Generation

Clearing History

Key Features

Configuration Tips