Base URL
When running locally:Endpoints
GET /
Renders the main web interface. Response- Returns HTML page (index.html template)
POST /search
Submit a query to the RAG engine and get a solution.The coding problem or question to search for
Inference mode:
general or reasoningThe generated solution or explanation
Error message if the request fails
curl
Python
POST /stop
Stop the currently running generation process. ResponseConfirmation message
curl
Python
POST /clear_history
Clear the conversation history stored in the memory buffer. ResponseConfirmation message
curl
Python
GET /get_history
Retrieve the current conversation history. ResponseFormatted conversation history with previous queries and responses
curl
Python
POST /set_mode
Change the inference mode between general and reasoning.Must be either
general or reasoningConfirmation message with the new mode
Error message if mode is invalid
curl
Python
Error Handling
All endpoints may return HTTP error codes:- 400 Bad Request - Invalid parameters or missing required fields
- 500 Internal Server Error - Server-side error during processing
error field containing a descriptive message.
Rate Limiting
There is currently no rate limiting implemented. Queries are processed sequentially by the RAG engine.The
/search endpoint response time varies based on the inference mode and query complexity. General mode typically responds in 15-20 seconds, while reasoning mode may take 3-4 minutes.