OpenAI Format - /v1/chat/completions

Endpoint

POST http://127.0.0.1:8045/v1/chat/completions

The OpenAI-compatible endpoint provides seamless integration with 99% of existing AI applications, allowing you to use Gemini and Claude models through the standard OpenAI format.

Authentication

Authorization

string

required

Bearer token authentication

Authorization: Bearer sk-antigravity

Alternatively, use the api_key header:

api_key

string

API key for authentication

Request Headers

Content-Type

string

required

Must be application/json

Request Body

model

string

required

Model identifier. Supports:

gemini-3-flash - Fast responses
gemini-3-pro-high - High quality reasoning
gemini-3-pro-low - Cost-efficient
claude-sonnet-4-6 - Latest Claude Sonnet
claude-sonnet-4-6-thinking - With extended thinking
Custom model mappings from your configuration

messages

array

required

Array of message objects forming the conversation

Show Message Object

role

string

required

Message role: user, assistant, system, or tool

content

string | array

required

Message content. Can be:

String: Plain text message
Array: Multi-modal content with text and images

name

string

Optional name for the message sender

tool_calls

array

Tool calls made by the assistant

tool_call_id

string

ID of the tool call this message responds to

stream

boolean

default:"false"

Enable streaming responses via Server-Sent Events (SSE)

max_tokens

integer

Maximum tokens to generate in the response

temperature

number

Sampling temperature (0.0 to 2.0). Higher values make output more random.

top_p

number

Nucleus sampling parameter (0.0 to 1.0)

tools

array

Available tools for function calling

Show Tool Definition

type

string

Tool type, typically function

function

object

Function definition with name, description, and parameters schema

tool_choice

string | object

Controls tool usage: auto, none, or specific tool selection

thinking

object

Extended thinking configuration for compatible models

Show Thinking Config

type

string

Thinking mode: enabled, disabled, or adaptive

budget_tokens

integer

Token budget for reasoning (e.g., 8192, 16384, 24576)

Response Format

Non-Streaming Response

string

Unique identifier for this completion

object

string

Object type, always chat.completion

created

integer

Unix timestamp of creation

model

string

Model used for generation

choices

array

Array of completion choices

Show Choice Object

index

integer

Choice index

message

object

Generated message

Show Message Fields

role

string

Always assistant

content

string

Generated text content

reasoning_content

string

Reasoning tokens for thinking models (if available)

tool_calls

array

Tool calls made by the assistant

finish_reason

string

Reason for completion: stop, length, tool_calls, content_filter

usage

object

Token usage statistics

Show Usage Object

prompt_tokens

integer

Tokens in the prompt

completion_tokens

integer

Tokens in the completion

total_tokens

integer

Total tokens used

completion_tokens_details

object

Detailed breakdown

Show Details

reasoning_tokens

integer

Tokens used for reasoning (thinking models)

Example: Basic Chat

curl -X POST http://127.0.0.1:8045/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-antigravity" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [
      {"role": "user", "content": "你好,请自我介绍"}
    ]
  }'

Example: With Streaming

curl -X POST http://127.0.0.1:8045/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-antigravity" \
  -d '{
    "model": "gemini-3-pro-high",
    "messages": [
      {"role": "user", "content": "Write a poem about AI"}
    ],
    "stream": true
  }'

Example: Python SDK

import openai

client = openai.OpenAI(
    api_key="sk-antigravity",
    base_url="http://127.0.0.1:8045/v1"
)

response = client.chat.completions.create(
    model="gemini-3-flash",
    messages=[{"role": "user", "content": "你好,请自我介绍"}]
)

print(response.choices[0].message.content)

import openai
import base64

client = openai.OpenAI(
    api_key="sk-antigravity",
    base_url="http://127.0.0.1:8045/v1"
)

# Read image and encode to base64
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gemini-3-flash",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_data}"
                }
            }
        ]
    }]
)

print(response.choices[0].message.content)

Model Routing

Antigravity Manager automatically routes models to the appropriate backend:

Gemini models → Google AI API via internal v1 protocol
Claude models → Anthropic API via model mapping
Custom mappings → Configure in Model Router settings

Features

Auto-conversion: Non-stream requests automatically converted to streaming for better quota management
Session affinity: Maintains account consistency for multi-turn conversations
Smart retry: Automatic account rotation on failures (429, 401 errors)
Tool calling: Full support for function calling with automatic MCP integration
Multi-modal: Supports images, audio, and documents in messages

Error Responses

Errors follow OpenAI format:

{
  "error": {
    "message": "Error description",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Common HTTP status codes:

400 - Invalid request format
401 - Authentication failed
429 - Rate limit exceeded (triggers auto-retry)
503 - No available accounts

Overview

Endpoints

Protocol Conversion

OpenAI Format - /v1/chat/completions

Endpoint

Authentication

Request Headers

Request Body

Response Format

Non-Streaming Response

Example: Basic Chat

Example: With Streaming

Example: Python SDK

Model Routing

Features

Error Responses

Build docs developers (and LLMs) love

Overview

Endpoints

Protocol Conversion

​Endpoint

​Authentication

​Request Headers

​Request Body

​Response Format

​Non-Streaming Response

​Example: Basic Chat

​Example: With Streaming

​Example: Python SDK

​Example: Multi-Modal (Image)

​Model Routing

​Features

​Error Responses

Build docs developers (and LLMs) love

Endpoint

Authentication

Request Headers

Request Body

Response Format

Non-Streaming Response

Example: Basic Chat

Example: With Streaming

Example: Python SDK

Example: Multi-Modal (Image)

Model Routing

Features

Error Responses