Quickstart
This guide will walk you through setting up Ollama, pulling the required models, and running your first query with Quest.Install Ollama
Quest uses Ollama to run language models locally. Install Ollama for your platform:Ollama runs as a local server on
http://localhost:11434. Quest communicates with this API to generate responses.Pull required models
Quest uses two different models depending on the query mode:Pull qwen2.5-coder:1.5b
This is the default model for general queries and explanations:Model specs:
- Size: ~1.5B parameters
- Purpose: Code generation and explanation
- Speed: Fast inference (typically < 15 seconds per query)
Pull deepseek-r1:7b
This model is used for complex reasoning tasks:Model specs:
- Size: ~7B parameters
- Purpose: Step-by-step reasoning for complex problems
- Speed: Slower but more thorough (typically < 4 minutes)
The reasoning model generates
<think> blocks that Quest automatically filters out to provide clean answers. You can also use the smaller deepseek-r1:1.5b variant by configuring the reasoning_model parameter.Start the Flask app
Now you’re ready to run Quest:Make your first query
With the Flask app running, you can interact with Quest in two ways:Using the web interface
Open your browser and navigate to:- Enter queries in the search box
- Switch between “General” and “Reasoning” modes
- View conversation history
- Clear history when starting a new topic
Using the API directly
You can also query Quest programmatically:Example queries
Try these example queries to see Quest in action:Exact problem match
Exact problem match
When you query an exact problem title, Quest retrieves it instantly from the hash map:Response includes the complete solution with metadata.
Conceptual explanation
Conceptual explanation
For general questions, Quest retrieves similar problems and generates explanations:Quest finds relevant DP problems and generates a detailed explanation.
Complex reasoning
Complex reasoning
For harder problems, switch to reasoning mode:The deepseek-r1 model provides step-by-step reasoning.
Follow-up questions
Follow-up questions
Quest maintains conversation history (configurable, default 3 queries):The second response incorporates context from the first query.
Understanding the response
Quest responses include:- Exact Match Solution - If query exactly matches a problem title
- Generated Solution - For general queries with retrieved context
- Relevant code snippets and explanations
- Problem metadata (difficulty, topics, companies)
API endpoints
The Flask app exposes several endpoints:| Endpoint | Method | Description |
|---|---|---|
/ | GET | Render the web interface |
/search | POST | Submit a query (see examples above) |
/set_mode | POST | Switch between general/reasoning mode |
/get_history | GET | Retrieve conversation history |
/clear_history | POST | Clear conversation history |
/stop | POST | Stop ongoing generation |
For detailed API documentation, see the API endpoints and Core Components pages.
Using the Python API directly
You can also use Quest without the Flask app:Configuration options
TheRAGEngine constructor accepts these parameters:
Advanced usage
Metadata filtering
Filter solutions by company, difficulty, or topics:Custom HNSW parameters
Tune retrieval speed vs accuracy:retriever2.py:47:
ef_search=16- Faster, less accurateef_search=32- Balanced (default)ef_search=64- Slower, more accurate
Troubleshooting
Connection refused to localhost:11434
Connection refused to localhost:11434
This means Ollama isn’t running. Start it with:Or on Windows/macOS, launch the Ollama application.
Model not found error
Model not found error
You need to pull the model first:
Slow response times
Slow response times
Several factors affect speed:
- First query - Slow as models load into memory
- Large k value - Reduce
kin search (try k=3 instead of k=5) - Reasoning mode - Inherently slower, switch to general for faster responses
- CPU threads - Increase
num_threadparameter if you have more cores
No relevant results found
No relevant results found
Try:
- Lower the
min_confidencethreshold (default 0.6) - Rephrase your query to be more specific
- Check that the problem exists in the dataset (1800+ LeetCode problems)
- Use exact problem titles for instant matches
Next steps
Now that you’ve run your first query, explore more features:API Reference
Detailed documentation of all API endpoints and classes
Configuration
Learn about advanced configuration options
Core Concepts
Understand how Quest’s components work together
Guides
Learn how to use Quest effectively