API endpoints

Quest provides a REST API through Flask that allows you to interact with the RAG engine programmatically. All endpoints accept and return JSON data.

Base URL

When running locally:

http://localhost:5000

Endpoints

GET /

Renders the main web interface. Response

Returns HTML page (index.html template)

POST /search

Submit a query to the RAG engine and get a solution.

query

string

required

The coding problem or question to search for

mode

string

default:"general"

Inference mode: general or reasoning

Response

response

string

The generated solution or explanation

Error Response

error

string

Error message if the request fails

Example Request

curl

curl -X POST http://localhost:5000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Two Sum problem",
    "mode": "general"
  }'

Python

import requests

response = requests.post(
    "http://localhost:5000/search",
    json={
        "query": "Two Sum problem",
        "mode": "general"
    }
)

data = response.json()
print(data["response"])

Example Response

{
  "response": "Generated Solution:\n[Solution content here]"
}

POST /stop

Stop the currently running generation process. Response

message

string

Confirmation message

Example Request

curl

curl -X POST http://localhost:5000/stop

Python

import requests

response = requests.post("http://localhost:5000/stop")
print(response.json())

Example Response

{
  "message": "Streaming stopped"
}

POST /clear_history

Clear the conversation history stored in the memory buffer. Response

message

string

Confirmation message

Example Request

curl

curl -X POST http://localhost:5000/clear_history

Python

import requests

response = requests.post("http://localhost:5000/clear_history")
print(response.json())

Example Response

{
  "message": "Conversation history cleared"
}

GET /get_history

Retrieve the current conversation history. Response

history

string

Formatted conversation history with previous queries and responses

Example Request

curl

curl http://localhost:5000/get_history

Python

import requests

response = requests.get("http://localhost:5000/get_history")
print(response.json())

Example Response

{
  "history": "User: What is dynamic programming?\nSystem: Dynamic programming is...\n"
}

POST /set_mode

Change the inference mode between general and reasoning.

mode

string

required

Must be either general or reasoning

Response

message

string

Confirmation message with the new mode

Error Response

error

string

Error message if mode is invalid

Example Request

curl

curl -X POST http://localhost:5000/set_mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "reasoning"}'

Python

import requests

response = requests.post(
    "http://localhost:5000/set_mode",
    json={"mode": "reasoning"}
)
print(response.json())

Example Response

{
  "message": "Mode set to: reasoning"
}

Error Response

{
  "error": "Mode must be 'general' or 'reasoning'."
}

Error Handling

All endpoints may return HTTP error codes:

400 Bad Request - Invalid parameters or missing required fields
500 Internal Server Error - Server-side error during processing

Error responses include a JSON object with an error field containing a descriptive message.

Rate Limiting

There is currently no rate limiting implemented. Queries are processed sequentially by the RAG engine.

The /search endpoint response time varies based on the inference mode and query complexity. General mode typically responds in 15-20 seconds, while reasoning mode may take 3-4 minutes.

Core Components

Web API

API endpoints

Base URL

Endpoints

GET /

POST /search

POST /stop

POST /clear_history

GET /get_history

POST /set_mode

Error Handling

Rate Limiting

Build docs developers (and LLMs) love

Core Components

Web API

​Base URL

​Endpoints

​GET /

​POST /search

​POST /stop

​POST /clear_history

​GET /get_history

​POST /set_mode

​Error Handling

​Rate Limiting

Build docs developers (and LLMs) love

Base URL

Endpoints

GET /

POST /search

POST /stop

POST /clear_history

GET /get_history

POST /set_mode

Error Handling

Rate Limiting