Overview
LiteLLM provides comprehensive support for Anthropic’s Claude models, including advanced features like prompt caching, computer use, web search, and extended thinking.Quick Start
Supported Models
- Claude 4
- Claude 3.7
- Claude 3.5
- Claude 3
Latest generation with extended thinking and advanced reasoning.
# Claude 4.6 - Latest model with reasoning
response = completion(
model="anthropic/claude-4-6-sonnet-20250514",
messages=[{"role": "user", "content": "Solve this complex problem..."}]
)
# With extended thinking (reasoning)
response = completion(
model="anthropic/claude-4-6-sonnet-20250514",
messages=[{"role": "user", "content": "Complex analysis task..."}],
thinking={
"type": "enabled",
"budget_tokens": 10000 # Allocate tokens for thinking
}
)
Advanced Sonnet model with excellent performance.
response = completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[{"role": "user", "content": "Write detailed analysis..."}],
max_tokens=4096
)
Popular Sonnet and Haiku models.
# Claude 3.5 Sonnet - Great balance
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Analyze this data..."}]
)
# Claude 3.5 Haiku - Fast and efficient
response = completion(
model="anthropic/claude-3-5-haiku-20241022",
messages=[{"role": "user", "content": "Quick task..."}]
)
Previous generation models.
# Claude 3 Opus - Most capable
response = completion(
model="anthropic/claude-3-opus-20240229",
messages=[{"role": "user", "content": "Complex reasoning..."}]
)
# Claude 3 Sonnet
response = completion(
model="anthropic/claude-3-sonnet-20240229",
messages=[{"role": "user", "content": "Balanced task..."}]
)
# Claude 3 Haiku - Fast
response = completion(
model="anthropic/claude-3-haiku-20240307",
messages=[{"role": "user", "content": "Quick query..."}]
)
Authentication
- Environment Variable
- Direct Parameter
export ANTHROPIC_API_KEY="sk-ant-..."
from litellm import completion
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Hello!"}]
)
from litellm import completion
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Hello!"}],
api_key="sk-ant-..."
)
Extended Thinking (Reasoning)
Claude 4.6 supports extended thinking for complex reasoning tasks:response = completion(
model="anthropic/claude-4-6-sonnet-20250514",
messages=[{"role": "user", "content": "Solve this math problem: ..."}],
thinking={
"type": "enabled",
"budget_tokens": 5000 # Tokens allocated for thinking
}
)
# Access thinking content
for block in response.choices[0].message.content:
if block.get("type") == "thinking":
print(f"Thinking: {block['thinking']}")
elif block.get("type") == "text":
print(f"Response: {block['text']}")
Prompt Caching
Save costs by caching frequently used context:response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are an expert in...", # Long system prompt
"cache_control": {"type": "ephemeral"} # Cache this
}
]
},
{"role": "user", "content": "Question 1"}
]
)
# Subsequent requests reuse cached context (5-minute TTL)
response2 = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
# Same cached system message
{"role": "system", "content": [{
"type": "text",
"text": "You are an expert in...",
"cache_control": {"type": "ephemeral"}
}]},
{"role": "user", "content": "Question 2"} # Only this is new
]
)
Computer Use
Claude can interact with computers through screenshots and commands:tools = [{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"display_number": 1
}]
response = completion(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{
"role": "user",
"content": "Click on the search button and type 'hello'"
}],
tools=tools
)
# Claude returns tool use with computer actions
for block in response.choices[0].message.content:
if block.get("type") == "tool_use":
action = block.get("input", {})
print(f"Action: {action.get('action')}")
# Actions: key, type, mouse_move, left_click, etc.
Web Search
Claude can search the web for current information:# Enable web search tool
tools = [{
"type": "web_search_20250101",
"name": "web_search",
"max_uses": 5, # Limit search queries
"user_location": {
"type": "auto" # or specify: {"type": "city", "city": "San Francisco, CA"}
}
}]
response = completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[{
"role": "user",
"content": "What are the latest developments in AI this week?"
}],
tools=tools
)
# Claude automatically searches and cites sources
for block in response.choices[0].message.content:
if block.get("type") == "text":
print(block.get("text"))
Function Calling
Claude supports sophisticated tool use:tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}]
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools
)
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Args: {tool_call.function.arguments}")
Vision (Multimodal)
Claude models support image analysis:- Image URL
- Base64 Image
- Multiple Images
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/image.jpg"}
}
]
}]
)
import base64
with open("image.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
}
]
}]
)
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Compare these screenshots"},
{"type": "image_url", "image_url": {"url": "https://..."}},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}]
)
Streaming
from litellm import completion
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming with Thinking
response = completion(
model="anthropic/claude-4-6-sonnet-20250514",
messages=[{"role": "user", "content": "Solve this problem..."}],
thinking={"type": "enabled", "budget_tokens": 5000},
stream=True
)
for chunk in response:
delta = chunk.choices[0].delta
# Handle thinking content
if hasattr(delta, 'thinking'):
print(f"[Thinking] {delta.thinking}", end="")
# Handle regular content
if delta.content:
print(delta.content, end="", flush=True)
JSON Mode
# JSON object mode
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{
"role": "user",
"content": "Extract: John is 30, lives in NYC, likes pizza"
}],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
Batch Processing
Process requests asynchronously in batches:from litellm import create_batch, retrieve_batch
# Create batch
batch = create_batch(
custom_llm_provider="anthropic",
input_file_id="file-abc123",
endpoint="/v1/messages"
)
print(f"Batch ID: {batch.id}")
# Retrieve results
batch_result = retrieve_batch(
custom_llm_provider="anthropic",
batch_id=batch.id
)
Advanced Parameters
System Messages
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
Temperature and Top P
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Be creative"}],
temperature=1.0, # 0.0 to 1.0
top_p=0.9,
top_k=50
)
Stop Sequences
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Count to 10"}],
stop=["5", "\n\n"] # Stop at these sequences
)
Max Tokens
# Important: Anthropic requires max_tokens to be set
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Write an essay"}],
max_tokens=4096 # Required parameter
)
Error Handling
from litellm import completion
from litellm.exceptions import (
AuthenticationError,
RateLimitError,
ContextWindowExceededError,
APIError
)
try:
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=1024
)
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit hit")
except ContextWindowExceededError:
print("Input too long")
except APIError as e:
print(f"API error: {e}")
Cost Tracking
from litellm import completion, completion_cost
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100
)
# Track costs including cache usage
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")
# Check cache usage
if hasattr(response.usage, 'cache_read_input_tokens'):
print(f"Cached tokens: {response.usage.cache_read_input_tokens}")
print(f"New tokens: {response.usage.prompt_tokens}")
Best Practices
Use Prompt Caching
Cache system prompts and long documents to reduce costs by up to 90%.
Set Max Tokens
Always set
max_tokens - it’s required by Anthropic’s API.Use Extended Thinking
Enable thinking for complex reasoning, math, and analysis tasks.
Try Haiku First
Use Claude 3.5 Haiku for simple tasks - it’s fast and cost-effective.
Related Documentation
Function Calling
Deep dive into tool use with Claude
Vision
Working with images in Claude
Streaming
Stream responses in real-time
Batching
Process requests in batches