Quickstart

This guide will walk you through setting up Ollama, pulling the required models, and running your first query with Quest.

Install Ollama

Quest uses Ollama to run language models locally. Install Ollama for your platform:

# Download and install from official website
curl -fsSL https://ollama.ai/install.sh | sh

# Or download the installer from:
# https://ollama.ai/download/mac

Ollama runs as a local server on http://localhost:11434. Quest communicates with this API to generate responses.

Pull required models

Quest uses two different models depending on the query mode:

Pull qwen2.5-coder:1.5b

This is the default model for general queries and explanations:

ollama pull qwen2.5-coder:1.5b

Model specs:

Size: ~1.5B parameters
Purpose: Code generation and explanation
Speed: Fast inference (typically < 15 seconds per query)

This model is optimized for coding problems and generates concise, accurate solutions.

Pull deepseek-r1:7b

This model is used for complex reasoning tasks:

ollama pull deepseek-r1:7b

Model specs:

Size: ~7B parameters
Purpose: Step-by-step reasoning for complex problems
Speed: Slower but more thorough (typically < 4 minutes)

The reasoning model generates <think> blocks that Quest automatically filters out to provide clean answers. You can also use the smaller deepseek-r1:1.5b variant by configuring the reasoning_model parameter.

Verify models are installed

Check that both models are available:

ollama list

You should see both qwen2.5-coder:1.5b and deepseek-r1:7b in the output.

Start the Flask app

Now you’re ready to run Quest:

cd Quest
python app.py

You should see output like:

 * Serving Flask app 'app'
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server.
 * Running on http://127.0.0.1:5000
INFO:root:RAG Engine initialized successfully.
INFO:root:Retriever initialized successfully.

The first time you run Quest, it may take a few seconds to load the sentence transformer model (all-MiniLM-L6-v2) into memory.

Make your first query

With the Flask app running, you can interact with Quest in two ways:

Using the web interface

Open your browser and navigate to:

http://127.0.0.1:5000

You’ll see the Quest interface where you can:

Enter queries in the search box
Switch between “General” and “Reasoning” modes
View conversation history
Clear history when starting a new topic

Using the API directly

You can also query Quest programmatically:

import requests

# Make a query
response = requests.post(
    "http://127.0.0.1:5000/search",
    json={
        "query": "Explain the Two Sum problem",
        "mode": "general"
    }
)

result = response.json()
print(result["response"])

Example queries

Try these example queries to see Quest in action:

Exact problem match

When you query an exact problem title, Quest retrieves it instantly from the hash map:

{
  "query": "Two Sum",
  "mode": "general"
}

Response includes the complete solution with metadata.

Conceptual explanation

For general questions, Quest retrieves similar problems and generates explanations:

{
  "query": "Explain dynamic programming with an example",
  "mode": "general"
}

Quest finds relevant DP problems and generates a detailed explanation.

Complex reasoning

For harder problems, switch to reasoning mode:

{
  "query": "How do I optimize a recursive solution with memoization?",
  "mode": "reasoning"
}

The deepseek-r1 model provides step-by-step reasoning.

Follow-up questions

Quest maintains conversation history (configurable, default 3 queries):

// First query
{"query": "What is the Two Sum problem?", "mode": "general"}

// Follow-up (uses context from first query)
{"query": "Can you show me the hash map approach?", "mode": "general"}

The second response incorporates context from the first query.

Understanding the response

Quest responses include:

Exact Match Solution - If query exactly matches a problem title
Generated Solution - For general queries with retrieved context
Relevant code snippets and explanations
Problem metadata (difficulty, topics, companies)

Example response structure:

Generated Solution:
The Two Sum problem asks you to find two numbers in an array that add up to a specific target...

**Approach:**
1. Use a hash map to store numbers and their indices
2. For each number, check if target - number exists in the map
3. Return indices when found

**Implementation:**
[code snippet]

**Complexity:**
- Time: O(n)
- Space: O(n)

API endpoints

The Flask app exposes several endpoints:

Endpoint	Method	Description
`/`	GET	Render the web interface
`/search`	POST	Submit a query (see examples above)
`/set_mode`	POST	Switch between general/reasoning mode
`/get_history`	GET	Retrieve conversation history
`/clear_history`	POST	Clear conversation history
`/stop`	POST	Stop ongoing generation

For detailed API documentation, see the API endpoints and Core Components pages.

Using the Python API directly

You can also use Quest without the Flask app:

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever
from rag_engine3 import RAGEngine

# Initialize components
retriever = LeetCodeRetriever()
rag_engine = RAGEngine(
    retriever=retriever,
    max_history=3  # Keep last 3 interactions
)

# Set mode (general or reasoning)
rag_engine.set_mode("general")

# Query the engine
response = rag_engine.answer_question(
    query="Explain the concept of dynamic programming",
    k=5,  # Retrieve top 5 similar problems
    min_confidence=0.6  # Minimum similarity threshold
)

print(response)

Configuration options

The RAGEngine constructor accepts these parameters:

rag_engine = RAGEngine(
    retriever=retriever,
    ollama_url="http://localhost:11434/api/generate",
    model_name="qwen2.5-coder:1.5b",      # General mode model
    reasoning_model="deepseek-r1:7b",     # Reasoning mode model
    mode="general",                        # Default mode
    temperature=0.4,                       # Lower = more focused
    top_p=0.9,                            # Nucleus sampling
    confidence_threshold=0.7,              # Retrieval threshold
    repeat_penalty=1.1,                    # Reduce repetition
    num_thread=8,                          # CPU threads for inference
    max_history=3                          # Conversation memory
)

Adjust temperature (0.1-1.0) to control response creativity. Lower values make responses more deterministic.

Advanced usage

Metadata filtering

Filter solutions by company, difficulty, or topics:

retriever = LeetCodeRetriever()

# Find all medium difficulty problems from Amazon about BFS
filtered = retriever.filter_by_metadata(
    companies=["Amazon"],
    difficulty="Medium",
    topics=["BFS"]
)

for solution in filtered:
    print(f"Title: {solution.title}")
    print(f"Topics: {solution.topics}")

Custom HNSW parameters

Tune retrieval speed vs accuracy:

retriever = LeetCodeRetriever(
    ef_search=32  # Default: 32. Higher = more accurate but slower
)

From retriever2.py:47:

ef_search=16 - Faster, less accurate
ef_search=32 - Balanced (default)
ef_search=64 - Slower, more accurate

Troubleshooting

Connection refused to localhost:11434

This means Ollama isn’t running. Start it with:

ollama serve

Or on Windows/macOS, launch the Ollama application.

Model not found error

You need to pull the model first:

ollama pull qwen2.5-coder:1.5b
ollama pull deepseek-r1:7b

Slow response times

Several factors affect speed:

First query - Slow as models load into memory
Large k value - Reduce k in search (try k=3 instead of k=5)
Reasoning mode - Inherently slower, switch to general for faster responses
CPU threads - Increase num_thread parameter if you have more cores

No relevant results found

Try:

Lower the min_confidence threshold (default 0.6)
Rephrase your query to be more specific
Check that the problem exists in the dataset (1800+ LeetCode problems)
Use exact problem titles for instant matches

Next steps

Now that you’ve run your first query, explore more features:

API Reference

Detailed documentation of all API endpoints and classes

Configuration

Learn about advanced configuration options

Core Concepts

Understand how Quest’s components work together

Guides

Learn how to use Quest effectively

Get Started

Core Concepts

Guides

Configuration

Quickstart

Quickstart

Install Ollama

Pull required models

Start the Flask app

Make your first query

Using the web interface

Using the API directly

Example queries

Understanding the response

API endpoints

Using the Python API directly

Configuration options

Advanced usage

Metadata filtering

Custom HNSW parameters

Troubleshooting

Next steps

API Reference

Configuration

Core Concepts

Guides

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

​Quickstart

​Install Ollama

​Pull required models

​Start the Flask app

​Make your first query

​Using the web interface

​Using the API directly

​Example queries

​Understanding the response

​API endpoints

​Using the Python API directly

​Configuration options

​Advanced usage

​Metadata filtering

​Custom HNSW parameters

​Troubleshooting

​Next steps

API Reference

Configuration

Core Concepts

Guides

Build docs developers (and LLMs) love

Quickstart

Install Ollama

Pull required models

Start the Flask app

Make your first query

Using the web interface

Using the API directly

Example queries

Understanding the response

API endpoints

Using the Python API directly

Configuration options

Advanced usage

Metadata filtering

Custom HNSW parameters

Troubleshooting

Next steps