The Chat endpoint is the primary interface for processing LLM chat completions. It accepts messages and model configurations, routes requests to the appropriate provider, and returns standardized responses.
This endpoint is rate-limited using a token bucket algorithm backed by Redis. Rate limits are enforced per API key. When the rate limit is exceeded, the endpoint returns a 429 status code.
Capacity: Configurable via RATE_LIMITER_CAPACITY
Refill Rate: Configurable via RATE_LIMITER_REFILL_RATE
{ "id": "abc123-def456-789", "provider": "gemini", "content": "The capital of France is Paris. It is located in the north-central part of the country and is known for its rich history, culture, and iconic landmarks like the Eiffel Tower.", "usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 }}