Skip to main content

Overview

AWS Bedrock provides access to multiple foundation models including Anthropic Claude, Meta Llama, Mistral AI, Amazon Nova, and more through a single API on AWS infrastructure.

Quick Start

1

Install LiteLLM

pip install litellm
2

Set AWS Credentials

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION_NAME="us-east-1"
3

Make Your First Call

from litellm import completion

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Hello from Bedrock!"}]
)
print(response.choices[0].message.content)

Supported Models

Claude models via Bedrock:
# Claude 3.7 Sonnet
response = completion(
    model="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=[{"role": "user", "content": "Complex task..."}]
)

# Claude 3.5 Sonnet
response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Analyze this..."}]
)

# Claude 3.5 Haiku
response = completion(
    model="bedrock/anthropic.claude-3-5-haiku-20241022-v1:0",
    messages=[{"role": "user", "content": "Quick question..."}]
)

# Claude 3 Opus
response = completion(
    model="bedrock/anthropic.claude-3-opus-20240229-v1:0",
    messages=[{"role": "user", "content": "Deep reasoning..."}]
)

Authentication

export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION_NAME="us-east-1"  # or us-west-2, eu-west-1, etc.
from litellm import completion

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Hello!"}]
)

Available Regions

Bedrock is available in multiple AWS regions:
RegionCodeModels
US East (N. Virginia)us-east-1All models
US West (Oregon)us-west-2All models
Europe (Frankfurt)eu-central-1Most models
Europe (Ireland)eu-west-1Most models
Asia Pacific (Singapore)ap-southeast-1Most models
Asia Pacific (Tokyo)ap-northeast-1Most models
# Use specific region
response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
    aws_region_name="eu-west-1"
)

Streaming

from litellm import completion

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling

Use tools with Claude on Bedrock:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Vision (Multimodal)

Use vision models like Claude or Llama 3.2 Vision:
# Claude with vision
response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

# Llama 3.2 Vision
response = completion(
    model="bedrock/us.meta.llama3-2-90b-instruct-v1:0",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this"},
            {"type": "image_url", "image_url": {"url": "..."}}
        ]
    }]
)

Embeddings

Generate embeddings using Bedrock:
from litellm import embedding

# Amazon Titan Embeddings
response = embedding(
    model="bedrock/amazon.titan-embed-text-v1",
    input="Hello world"
)
print(len(response.data[0].embedding))  # 1536 dimensions

# Titan Embeddings V2
response = embedding(
    model="bedrock/amazon.titan-embed-text-v2:0",
    input="Hello world"
)

# Cohere Embeddings
response = embedding(
    model="bedrock/cohere.embed-english-v3",
    input=["Text 1", "Text 2"]
)

Reranking

Rerank documents using Cohere on Bedrock:
from litellm import rerank

response = rerank(
    model="bedrock/cohere.rerank-v3-5:0",
    query="What is machine learning?",
    documents=[
        "Machine learning is a subset of AI",
        "Python is a programming language",
        "Deep learning uses neural networks"
    ]
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score}")

Batch Processing

Process requests asynchronously:
from litellm import create_batch, retrieve_batch

batch = create_batch(
    custom_llm_provider="bedrock",
    input_file_id="s3://bucket/input.jsonl",
    endpoint="/invoke",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Check status
batch_status = retrieve_batch(
    custom_llm_provider="bedrock",
    batch_id=batch.id
)

Converse API vs Invoke API

Bedrock supports two APIs:

Cross-Region Inference

Use cross-region inference profiles:
# Cross-region profile
response = completion(
    model="bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Hello"}],
    aws_region_name="us-east-1"
)

Guardrails

Apply AWS Bedrock Guardrails:
response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Hello"}],
    guardrails={
        "id": "your-guardrail-id",
        "version": "1"
    }
)

Advanced Parameters

Temperature and Sampling

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Be creative"}],
    temperature=0.9,
    top_p=0.95,
    max_tokens=1000
)

System Messages

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    APIError
)

try:
    response = completion(
        model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError:
    print("AWS credentials invalid")
except RateLimitError:
    print("Bedrock throttling limit hit")
except APIError as e:
    print(f"Bedrock error: {e}")

Cost Tracking

from litellm import completion, completion_cost

response = completion(
    model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=[{"role": "user", "content": "Hello!"}]
)

cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")

Model Access

Before using models, ensure they’re enabled in your AWS account:
  1. Go to AWS Bedrock console
  2. Navigate to “Model access”
  3. Request access for desired models
  4. Wait for approval (usually instant for most models)

Best Practices

Use IAM Roles

When on AWS, use IAM roles instead of access keys for better security.

Choose Right Region

Select a region close to your users for lower latency.

Enable Model Access

Request model access in Bedrock console before use.

Use Converse API

Prefer Converse API for better compatibility across models.

Anthropic

Learn about Claude-specific features

Function Calling

Implement tool use on Bedrock

Embeddings

Generate embeddings on Bedrock

Streaming

Stream responses in real-time

Build docs developers (and LLMs) love