Skip to main content
Quest provides a REST API through Flask that allows you to interact with the RAG engine programmatically. All endpoints accept and return JSON data.

Base URL

When running locally:
http://localhost:5000

Endpoints

GET /

Renders the main web interface. Response
  • Returns HTML page (index.html template)

POST /search

Submit a query to the RAG engine and get a solution.
query
string
required
The coding problem or question to search for
mode
string
default:"general"
Inference mode: general or reasoning
Response
response
string
The generated solution or explanation
Error Response
error
string
Error message if the request fails
Example Request
curl
curl -X POST http://localhost:5000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Two Sum problem",
    "mode": "general"
  }'
Python
import requests

response = requests.post(
    "http://localhost:5000/search",
    json={
        "query": "Two Sum problem",
        "mode": "general"
    }
)

data = response.json()
print(data["response"])
Example Response
{
  "response": "Generated Solution:\n[Solution content here]"
}

POST /stop

Stop the currently running generation process. Response
message
string
Confirmation message
Example Request
curl
curl -X POST http://localhost:5000/stop
Python
import requests

response = requests.post("http://localhost:5000/stop")
print(response.json())
Example Response
{
  "message": "Streaming stopped"
}

POST /clear_history

Clear the conversation history stored in the memory buffer. Response
message
string
Confirmation message
Example Request
curl
curl -X POST http://localhost:5000/clear_history
Python
import requests

response = requests.post("http://localhost:5000/clear_history")
print(response.json())
Example Response
{
  "message": "Conversation history cleared"
}

GET /get_history

Retrieve the current conversation history. Response
history
string
Formatted conversation history with previous queries and responses
Example Request
curl
curl http://localhost:5000/get_history
Python
import requests

response = requests.get("http://localhost:5000/get_history")
print(response.json())
Example Response
{
  "history": "User: What is dynamic programming?\nSystem: Dynamic programming is...\n"
}

POST /set_mode

Change the inference mode between general and reasoning.
mode
string
required
Must be either general or reasoning
Response
message
string
Confirmation message with the new mode
Error Response
error
string
Error message if mode is invalid
Example Request
curl
curl -X POST http://localhost:5000/set_mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "reasoning"}'
Python
import requests

response = requests.post(
    "http://localhost:5000/set_mode",
    json={"mode": "reasoning"}
)
print(response.json())
Example Response
{
  "message": "Mode set to: reasoning"
}
Error Response
{
  "error": "Mode must be 'general' or 'reasoning'."
}

Error Handling

All endpoints may return HTTP error codes:
  • 400 Bad Request - Invalid parameters or missing required fields
  • 500 Internal Server Error - Server-side error during processing
Error responses include a JSON object with an error field containing a descriptive message.

Rate Limiting

There is currently no rate limiting implemented. Queries are processed sequentially by the RAG engine.
The /search endpoint response time varies based on the inference mode and query complexity. General mode typically responds in 15-20 seconds, while reasoning mode may take 3-4 minutes.

Build docs developers (and LLMs) love