Skip to main content
Creates a message using Anthropic’s Messages API format. The gateway automatically converts the request to OpenAI format and routes to the appropriate provider.

Endpoint

POST https://api.llmgateway.io/v1/messages

Authentication

Requires authentication using Bearer token or x-api-key header. See Authentication.

Request Body

model
string
required
The model to use for completion.Example: "claude-3-5-sonnet-20241022"
messages
array
required
Array of message objects in the conversation.Each message can have:
  • role (string): "user", "assistant", "tool", or "function"
  • content (string | array): Message content or array of content blocks
  • tool_call_id (string, optional): For tool role messages
  • name (string, optional): Message sender name
  • tool_calls (array, optional): Tool calls made by assistant (OpenAI format)
  • function_call (object, optional): Function call (legacy OpenAI format)
Content blocks can be:
  • Text: {"type": "text", "text": "...", "cache_control": {...}}
  • Image: {"type": "image", "source": {"type": "base64", "media_type": "...", "data": "..."}}
  • Tool use: {"type": "tool_use", "id": "...", "name": "...", "input": {...}}
  • Tool result: {"type": "tool_result", "tool_use_id": "...", "content": "...", "is_error": false}
max_tokens
number
required
Maximum number of tokens to generate.Example: 1024
system
string | array
System prompt to provide context.Can be:
  • String: Simple text prompt
  • Array: Array of text blocks with cache control
Example:
"You are a helpful assistant."
Or:
[
  {
    "type": "text",
    "text": "You are a helpful assistant.",
    "cache_control": {"type": "ephemeral"}
  }
]
temperature
number
Sampling temperature between 0 and 1.Example: 0.7
tools
array
Available tools for the model to use.Each tool has:
  • name (string): Tool name
  • description (string): Tool description
  • input_schema (object): JSON schema for tool parameters
Example:
[{
  "name": "get_weather",
  "description": "Get weather information",
  "input_schema": {
    "type": "object",
    "properties": {
      "location": {"type": "string"}
    },
    "required": ["location"]
  }
}]
stream
boolean
default:false
Whether to stream the response as Server-Sent Events.Example: false

Response

id
string
Unique identifier for the message.
type
string
Object type, always "message".
role
string
Message role, always "assistant".
model
string
The model used for completion.
content
array
Array of content blocks.Each block can be:
  • Text: {"type": "text", "text": "..."}
  • Tool use: {"type": "tool_use", "id": "...", "name": "...", "input": {...}}
stop_reason
string
Why the model stopped generating.Values:
  • "end_turn" - Natural stop
  • "max_tokens" - Reached token limit
  • "stop_sequence" - Hit stop sequence
  • "tool_use" - Model wants to use a tool
stop_sequence
string | null
The stop sequence that ended generation, if any.
usage
object
Token usage information.Contains:
  • input_tokens (number): Tokens in the input
  • output_tokens (number): Tokens in the output

Examples

Basic Message

curl https://api.llmgateway.io/v1/messages \
  -H "Authorization: Bearer $LLMGATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

With System Prompt

import requests
import os

response = requests.post(
    "https://api.llmgateway.io/v1/messages",
    headers={
        "Authorization": f"Bearer {os.environ.get('LLMGATEWAY_API_KEY')}",
        "Content-Type": "application/json"
    },
    json={
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "system": "You are a helpful coding assistant.",
        "messages": [
            {"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
        ]
    }
)

data = response.json()
print(data["content"][0]["text"])

With Tool Calls

import requests
import os
import json

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }
]

response = requests.post(
    "https://api.llmgateway.io/v1/messages",
    headers={
        "Authorization": f"Bearer {os.environ.get('LLMGATEWAY_API_KEY')}",
        "Content-Type": "application/json"
    },
    json={
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "tools": tools,
        "messages": [
            {"role": "user", "content": "What's the weather in San Francisco?"}
        ]
    }
)

data = response.json()

if data["stop_reason"] == "tool_use":
    for block in data["content"]:
        if block["type"] == "tool_use":
            print(f"Tool: {block['name']}")
            print(f"Input: {json.dumps(block['input'], indent=2)}")

Streaming Response

import requests
import os
import json

response = requests.post(
    "https://api.llmgateway.io/v1/messages",
    headers={
        "Authorization": f"Bearer {os.environ.get('LLMGATEWAY_API_KEY')}",
        "Content-Type": "application/json"
    },
    json={
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "stream": True,
        "messages": [
            {"role": "user", "content": "Tell me a short story."}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('event: '):
            event_type = line_str[7:]
        elif line_str.startswith('data: '):
            data = json.loads(line_str[6:])
            
            if data.get('type') == 'content_block_delta':
                delta = data.get('delta', {})
                if delta.get('type') == 'text_delta':
                    print(delta.get('text', ''), end='', flush=True)

Multimodal with Images

import requests
import os
import base64

# Read and encode image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = requests.post(
    "https://api.llmgateway.io/v1/messages",
    headers={
        "Authorization": f"Bearer {os.environ.get('LLMGATEWAY_API_KEY')}",
        "Content-Type": "application/json"
    },
    json={
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/jpeg",
                            "data": image_data
                        }
                    }
                ]
            }
        ]
    }
)

data = response.json()
print(data["content"][0]["text"])

Response Example

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-5-sonnet-20241022",
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm Claude, an AI assistant. How can I help you today?"
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 18
  }
}

Streaming Events

When stream: true, the response is a stream of Server-Sent Events:

message_start

{
  "type": "message_start",
  "message": {
    "id": "msg_abc123",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-5-sonnet-20241022",
    "content": [],
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {"input_tokens": 12, "output_tokens": 0}
  }
}

content_block_start

{
  "type": "content_block_start",
  "index": 0,
  "content_block": {"type": "text", "text": ""}
}

content_block_delta

{
  "type": "content_block_delta",
  "index": 0,
  "delta": {"type": "text_delta", "text": "Hello"}
}

content_block_stop

{
  "type": "content_block_stop",
  "index": 0
}

message_delta

{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null
  },
  "usage": {"input_tokens": 0, "output_tokens": 18}
}

message_stop

{
  "type": "message_stop"
}

Error Responses

Invalid Request

{
  "error": true,
  "status": 400,
  "message": "Invalid request format: messages: Required"
}

Missing max_tokens

{
  "error": true,
  "status": 400,
  "message": "Invalid request format: max_tokens: Required"
}

Format Conversions

The gateway automatically converts between Anthropic and OpenAI formats:

Message Roles

  • tooltool (OpenAI tool message)
  • functiontool (legacy OpenAI function)
  • assistant with tool_useassistant with tool_calls
  • user with tool_resulttool messages

Content Blocks

  • Text blocks → Simple string or text content part
  • Image blocks → image_url content parts with data URLs
  • Tool use blocks → tool_calls array
  • Tool result blocks → tool role messages

System Prompt

  • String system → system role message in OpenAI
  • Array system → Concatenated text in system message

Notes

  • The gateway converts Anthropic format to OpenAI format internally
  • All OpenAI-compatible models can be used through this endpoint
  • The response is converted back to Anthropic format
  • Streaming uses Anthropic’s event format
  • Cache control is preserved during conversion
  • Tool calls are bidirectionally converted between formats

Build docs developers (and LLMs) love