Skip to main content

Anthropic SDK Integration

LLM Gateway supports the Anthropic SDK in two ways:
  1. OpenAI-Compatible Format: Use the OpenAI SDK format (recommended)
  2. Native Anthropic Format: Use the native /v1/messages endpoint
The easiest way to use LLM Gateway with Anthropic models is through the OpenAI SDK format:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmgateway.io/v1",
    api_key="your-llmgateway-api-key"
)

response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Native Anthropic Format

You can also use LLM Gateway’s native Anthropic /v1/messages endpoint with the official Anthropic SDK:

Installation

pip install anthropic

Basic Usage

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.llmgateway.io/v1",
    api_key="your-llmgateway-api-key"
)

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude!"}
    ]
)

print(message.content[0].text)

Streaming with Native Format

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.llmgateway.io/v1",
    api_key="your-llmgateway-api-key"
)

with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short story"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Tool Use (Function Calling)

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.llmgateway.io/v1",
    api_key="your-llmgateway-api-key"
)

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }
]

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather in San Francisco?"}
    ]
)

print(message.content)

Before and After Comparison

Python

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-ant-..."  # Anthropic API key
)

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Node.js

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
    apiKey: 'sk-ant-...'  // Anthropic API key
});

const message = await client.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages: [{ role: 'user', content: 'Hello!' }]
});

Extended Thinking (Reasoning)

Claude 3.7 Sonnet supports extended thinking for complex reasoning tasks:
message = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=4096,
    thinking={
        "type": "enabled",
        "budget_tokens": 4000  # Allocate tokens for reasoning
    },
    messages=[
        {"role": "user", "content": "Solve this complex problem..."}
    ]
)

# Access reasoning process
for block in message.content:
    if block.type == "thinking":
        print("Reasoning:", block.thinking)
    elif block.type == "text":
        print("Response:", block.text)

Prompt Caching

Anthropic’s prompt caching is automatically supported:
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Hello!"}]
)

# Check cache usage in response
print(f"Cache read tokens: {message.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {message.usage.cache_creation_input_tokens}")

Model Selection

When using the native Anthropic format, use Anthropic’s model names:
# Claude 3.7 Sonnet (latest)
model="claude-3-7-sonnet-20250219"

# Claude 3.5 Sonnet
model="claude-3-5-sonnet-20241022"

# Claude 3 Opus
model="claude-3-opus-20240229"

# Claude 3.5 Haiku
model="claude-3-5-haiku-20241022"
With OpenAI-compatible format, you can use automatic routing:
# Auto-route to best Anthropic model
model="anthropic/claude-3-5-sonnet-20241022"

# Or use LLM Gateway's unified naming
model="gpt-5"  # May route to Claude depending on availability

Comparison: OpenAI vs Native Format

FeatureOpenAI FormatNative Anthropic Format
Endpoint/v1/chat/completions/v1/messages
SDKOpenAI SDKAnthropic SDK
Response FormatOpenAI-compatibleAnthropic native
Streaming✅ Supported✅ Supported
Tool Use✅ Supported✅ Supported
Prompt Caching✅ Automatic✅ Full control
Extended ThinkingVia reasoning_effortVia thinking parameter
Multi-provider✅ Works with all providers❌ Anthropic only

Caveats and Limitations

  • System Messages: In native format, system messages use a separate system parameter, not the messages array
  • Max Tokens: max_tokens is required in native format but optional in OpenAI format
  • Response Structure: Native format returns Anthropic’s response structure with different field names
  • Provider Lock-in: Native format only works with Anthropic models; OpenAI format supports all providers

Next Steps

Build docs developers (and LLMs) love