Endpoint
Authentication
Bearer token authentication
api_key header:
API key for authentication
Request Headers
Must be
application/jsonRequest Body
Model identifier. Supports:
gemini-3-flash- Fast responsesgemini-3-pro-high- High quality reasoninggemini-3-pro-low- Cost-efficientclaude-sonnet-4-6- Latest Claude Sonnetclaude-sonnet-4-6-thinking- With extended thinking- Custom model mappings from your configuration
Array of message objects forming the conversation
Enable streaming responses via Server-Sent Events (SSE)
Maximum tokens to generate in the response
Sampling temperature (0.0 to 2.0). Higher values make output more random.
Nucleus sampling parameter (0.0 to 1.0)
Available tools for function calling
Controls tool usage:
auto, none, or specific tool selectionExtended thinking configuration for compatible models
Response Format
Non-Streaming Response
Unique identifier for this completion
Object type, always
chat.completionUnix timestamp of creation
Model used for generation
Array of completion choices
Token usage statistics
Example: Basic Chat
Example: With Streaming
Example: Python SDK
Example: Multi-Modal (Image)
Model Routing
Antigravity Manager automatically routes models to the appropriate backend:- Gemini models → Google AI API via internal v1 protocol
- Claude models → Anthropic API via model mapping
- Custom mappings → Configure in Model Router settings
Features
- Auto-conversion: Non-stream requests automatically converted to streaming for better quota management
- Session affinity: Maintains account consistency for multi-turn conversations
- Smart retry: Automatic account rotation on failures (429, 401 errors)
- Tool calling: Full support for function calling with automatic MCP integration
- Multi-modal: Supports images, audio, and documents in messages
Error Responses
Errors follow OpenAI format:400- Invalid request format401- Authentication failed429- Rate limit exceeded (triggers auto-retry)503- No available accounts