Skip to main content
The Quest web interface is built with Flask, providing a simple REST API for interacting with the RAG engine.

Application Structure

The Flask application is defined in app.py and provides a web interface and REST API endpoints.

Initialization

from flask import Flask, render_template, request, Response, jsonify
from src.DSAAssistant.components.retriever2 import LeetCodeRetriever, Solution
from rag_engine3 import RAGEngine
import logging
import time

# Initialize Flask app
app = Flask(__name__)

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize RAG Engine
retriever = LeetCodeRetriever()
rag_engine = RAGEngine(retriever, max_history=3)
The application initializes:
  1. Flask app instance - The main application
  2. Logging - INFO level logging for request tracking
  3. LeetCodeRetriever - Loads the HNSW index and metadata
  4. RAGEngine - Initialized with max_history=3 to retain last 3 conversations

Running the Application

Development Mode

Run the Flask application in development mode:
python app.py
The application starts on http://localhost:5000 by default.
The application is configured with debug=False in production. Change to debug=True for development.

Production Mode

For production deployments, use a WSGI server like Gunicorn:
gunicorn app:app --bind 0.0.0.0:5000 --workers 4
The RAG engine processes queries sequentially. Using multiple workers may lead to race conditions. Consider using a single worker or implementing proper request queuing.

Configuration

Application Settings

if __name__ == '__main__':
    app.run(debug=False)
You can configure the Flask app with additional parameters:
app.run(
    host='0.0.0.0',  # Listen on all interfaces
    port=5000,       # Port number
    debug=False,     # Debug mode
    threaded=True    # Enable threading
)

Environment Variables

While the application doesn’t use environment variables by default, you can add them for configuration:
import os

OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434/api/generate')
MAX_HISTORY = int(os.getenv('MAX_HISTORY', '3'))

rag_engine = RAGEngine(
    retriever, 
    ollama_url=OLLAMA_URL,
    max_history=MAX_HISTORY
)

Request Logging

The application logs all search requests with timing information:
# Log the start time
start_time = time.time()

# Set the mode (general or reasoning)
rag_engine.set_mode(mode)
logger.info(f"Mode set to: {mode}")

# Get response from RAG engine
response = rag_engine.answer_question(query)

# Log the time taken
logger.info(f"Response generated in {time.time() - start_time:.2f} seconds")
Example log output:
INFO:__main__:Mode set to: general
INFO:__main__:Response generated in 15.32 seconds

Error Handling

All endpoints include try-except blocks for error handling:
try:
    # Set the mode (general or reasoning)
    rag_engine.set_mode(mode)
    logger.info(f"Mode set to: {mode}")

    # Get response from RAG engine
    response = rag_engine.answer_question(query)

    # Return the response as JSON
    return jsonify({"response": response})

except Exception as e:
    logger.error(f"Error processing query: {e}")
    return jsonify({"error": "An error occurred while processing your request."}), 500
Errors are:
  • Logged with the error message
  • Returned to the client with HTTP 500 status
  • Include a generic error message (not the full stack trace)

Template Rendering

The main route renders an HTML template:
@app.route('/')
def index():
    """Render the main index page."""
    return render_template('index.html')
The template is located at templates/index.html and provides the web interface for making queries.

CORS Configuration

If you need to enable CORS for external API access, install flask-cors:
pip install flask-cors
Then configure CORS in your application:
from flask_cors import CORS

app = Flask(__name__)
CORS(app)  # Enable CORS for all routes
Or configure it for specific routes:
from flask_cors import cross_origin

@app.route('/search', methods=['POST'])
@cross_origin()
def search():
    # ... endpoint implementation

Deployment Considerations

Dependencies

Ensure all dependencies are installed:
pip install -r requirements.txt
Key dependencies:
  • flask>=2.1.0 - Web framework
  • gunicorn - Production WSGI server (if using)
  • flask-caching - Caching support (installed but not configured)

Performance

The initial request may take longer as the RAG engine loads the HNSW index and embeddings model. Subsequent requests will be faster.
Loading times:
  • HNSW index: ~1-2 seconds
  • Sentence transformer model: ~2-3 seconds
  • First query: ~15-20 seconds (general mode)
  • Subsequent queries: ~15 seconds (general mode)

Memory Usage

The application keeps several large objects in memory:
  • HNSW index (~50-100 MB)
  • Sentence transformer model (~100 MB)
  • Metadata for 1800+ solutions (~10-20 MB)
  • Conversation history (scales with usage)
Plan for at least 500 MB - 1 GB of RAM for the application.

Ollama Dependency

The application requires Ollama to be running on the same machine or a remote server:
# Default Ollama URL
http://localhost:11434/api/generate
Ensure Ollama is running before starting the Flask app:
ollama serve
And that the required models are available:
ollama pull qwen2.5-coder:1.5b
ollama pull deepseek-r1:7b

Build docs developers (and LLMs) love