Skip to main content

Overview

LiteLLM provides comprehensive support for Anthropic’s Claude models, including advanced features like prompt caching, computer use, web search, and extended thinking.

Quick Start

1

Install LiteLLM

pip install litellm
2

Set API Key

export ANTHROPIC_API_KEY="sk-ant-..."
3

Make Your First Call

from litellm import completion

response = completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello Claude!"}]
)
print(response.choices[0].message.content)

Supported Models

Latest generation with extended thinking and advanced reasoning.
# Claude 4.6 - Latest model with reasoning
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this complex problem..."}]
)

# With extended thinking (reasoning)
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Complex analysis task..."}],
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Allocate tokens for thinking
    }
)

Authentication

export ANTHROPIC_API_KEY="sk-ant-..."
from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello!"}]
)

Extended Thinking (Reasoning)

Claude 4.6 supports extended thinking for complex reasoning tasks:
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this math problem: ..."}],
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Tokens allocated for thinking
    }
)

# Access thinking content
for block in response.choices[0].message.content:
    if block.get("type") == "thinking":
        print(f"Thinking: {block['thinking']}")
    elif block.get("type") == "text":
        print(f"Response: {block['text']}")

Prompt Caching

Save costs by caching frequently used context:
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an expert in...",  # Long system prompt
                    "cache_control": {"type": "ephemeral"}  # Cache this
                }
            ]
        },
        {"role": "user", "content": "Question 1"}
    ]
)

# Subsequent requests reuse cached context (5-minute TTL)
response2 = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        # Same cached system message
        {"role": "system", "content": [{
            "type": "text",
            "text": "You are an expert in...",
            "cache_control": {"type": "ephemeral"}
        }]},
        {"role": "user", "content": "Question 2"}  # Only this is new
    ]
)

Computer Use

Claude can interact with computers through screenshots and commands:
tools = [{
    "type": "computer_20241022",
    "name": "computer",
    "display_width_px": 1920,
    "display_height_px": 1080,
    "display_number": 1
}]

response = completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": "Click on the search button and type 'hello'"
    }],
    tools=tools
)

# Claude returns tool use with computer actions
for block in response.choices[0].message.content:
    if block.get("type") == "tool_use":
        action = block.get("input", {})
        print(f"Action: {action.get('action')}")
        # Actions: key, type, mouse_move, left_click, etc.
Claude can search the web for current information:
# Enable web search tool
tools = [{
    "type": "web_search_20250101",
    "name": "web_search",
    "max_uses": 5,  # Limit search queries
    "user_location": {
        "type": "auto"  # or specify: {"type": "city", "city": "San Francisco, CA"}
    }
}]

response = completion(
    model="anthropic/claude-3-7-sonnet-20250219",
    messages=[{
        "role": "user",
        "content": "What are the latest developments in AI this week?"
    }],
    tools=tools
)

# Claude automatically searches and cites sources
for block in response.choices[0].message.content:
    if block.get("type") == "text":
        print(block.get("text"))

Function Calling

Claude supports sophisticated tool use:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
}]

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")

Vision (Multimodal)

Claude models support image analysis:
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

Streaming

from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming with Thinking

response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this problem..."}],
    thinking={"type": "enabled", "budget_tokens": 5000},
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta
    
    # Handle thinking content
    if hasattr(delta, 'thinking'):
        print(f"[Thinking] {delta.thinking}", end="")
    
    # Handle regular content
    if delta.content:
        print(delta.content, end="", flush=True)

JSON Mode

# JSON object mode
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": "Extract: John is 30, lives in NYC, likes pizza"
    }],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

Batch Processing

Process requests asynchronously in batches:
from litellm import create_batch, retrieve_batch

# Create batch
batch = create_batch(
    custom_llm_provider="anthropic",
    input_file_id="file-abc123",
    endpoint="/v1/messages"
)

print(f"Batch ID: {batch.id}")

# Retrieve results
batch_result = retrieve_batch(
    custom_llm_provider="anthropic",
    batch_id=batch.id
)

Advanced Parameters

System Messages

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

Temperature and Top P

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Be creative"}],
    temperature=1.0,  # 0.0 to 1.0
    top_p=0.9,
    top_k=50
)

Stop Sequences

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Count to 10"}],
    stop=["5", "\n\n"]  # Stop at these sequences
)

Max Tokens

# Important: Anthropic requires max_tokens to be set
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Write an essay"}],
    max_tokens=4096  # Required parameter
)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    ContextWindowExceededError,
    APIError
)

try:
    response = completion(
        model="anthropic/claude-3-5-sonnet-20240620",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=1024
    )
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit hit")
except ContextWindowExceededError:
    print("Input too long")
except APIError as e:
    print(f"API error: {e}")

Cost Tracking

from litellm import completion, completion_cost

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100
)

# Track costs including cache usage
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

# Check cache usage
if hasattr(response.usage, 'cache_read_input_tokens'):
    print(f"Cached tokens: {response.usage.cache_read_input_tokens}")
    print(f"New tokens: {response.usage.prompt_tokens}")

Best Practices

Use Prompt Caching

Cache system prompts and long documents to reduce costs by up to 90%.

Set Max Tokens

Always set max_tokens - it’s required by Anthropic’s API.

Use Extended Thinking

Enable thinking for complex reasoning, math, and analysis tasks.

Try Haiku First

Use Claude 3.5 Haiku for simple tasks - it’s fast and cost-effective.

Function Calling

Deep dive into tool use with Claude

Vision

Working with images in Claude

Streaming

Stream responses in real-time

Batching

Process requests in batches

Build docs developers (and LLMs) love