AWS Bedrock - Portkey AI Gateway

Overview

Amazon Bedrock provides access to foundation models from leading AI companies including Anthropic, Meta, Mistral, Cohere, and Amazon through a unified API with AWS security, compliance, and infrastructure. Service: bedrock (data plane) and bedrock-runtime (inference)

Supported Features

✅ Chat Completions (Converse API)
✅ Streaming
✅ Embeddings
✅ Image Generation (Stable Diffusion, Titan)
✅ Function Calling (via Converse API)
✅ Batch Inference
✅ Model Customization (Fine-tuning)
✅ Guardrails
✅ Multiple Authentication Methods

Quick Start

Basic Configuration

from portkey_ai import Portkey

client = Portkey(
    provider="bedrock",
    aws_access_key_id="AKIA***",
    aws_secret_access_key="***",
    aws_region="us-east-1"
)

response = client.chat.completions.create(
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[
        {"role": "user", "content": "Explain AWS Bedrock in simple terms"}
    ]
)

print(response.choices[0].message.content)

Available Models

Anthropic Claude

Model ID	Model	Context	Best For
`anthropic.claude-3-5-sonnet-20241022-v2:0`	Claude 3.5 Sonnet	200K	Most capable
`anthropic.claude-3-5-haiku-20241022-v1:0`	Claude 3.5 Haiku	200K	Fast, efficient
`anthropic.claude-3-opus-20240229-v1:0`	Claude 3 Opus	200K	Complex tasks
`anthropic.claude-3-sonnet-20240229-v1:0`	Claude 3 Sonnet	200K	Balanced
`anthropic.claude-3-haiku-20240307-v1:0`	Claude 3 Haiku	200K	Speed

Meta Llama

Model ID	Context	Description
`meta.llama3-3-70b-instruct-v1:0`	128K	Latest Llama 3.3
`meta.llama3-1-405b-instruct-v1:0`	128K	Largest Llama 3.1
`meta.llama3-1-70b-instruct-v1:0`	128K	Efficient Llama 3.1
`meta.llama3-1-8b-instruct-v1:0`	128K	Fast, compact

Mistral AI

Model ID	Context	Description
`mistral.mistral-large-2407-v1:0`	128K	Most capable
`mistral.mistral-large-2402-v1:0`	32K	Previous generation
`mistral.mistral-small-2402-v1:0`	32K	Cost-effective

Amazon Titan

Model ID	Type	Description
`amazon.titan-text-premier-v1:0`	Text	Premier text model
`amazon.titan-text-express-v1`	Text	Fast generation
`amazon.titan-embed-text-v2:0`	Embeddings	Text embeddings
`amazon.titan-image-generator-v2:0`	Image	Image generation

Cohere

Model ID	Type	Description
`cohere.command-r-plus-v1:0`	Chat	Most capable
`cohere.command-r-v1:0`	Chat	Balanced
`cohere.embed-english-v3`	Embeddings	English embeddings
`cohere.embed-multilingual-v3`	Embeddings	Multilingual

AI21 Labs

Model ID	Description
`ai21.jamba-1-5-large-v1:0`	Latest Jamba
`ai21.jamba-1-5-mini-v1:0`	Compact Jamba

Stability AI

Model ID	Type	Description
`stability.stable-diffusion-xl-v1`	Image	SDXL 1.0
`stability.sd3-large-v1:0`	Image	Stable Diffusion 3

Authentication Methods

1. Access Keys (Default)

client = Portkey(
    provider="bedrock",
    aws_access_key_id="AKIA***",
    aws_secret_access_key="***",
    aws_session_token="***",  # Optional for temporary credentials
    aws_region="us-east-1"
)

2. Assumed Role

client = Portkey(
    provider="bedrock",
    aws_auth_type="assumedRole",
    aws_role_arn="arn:aws:iam::123456789012:role/BedrockRole",
    aws_external_id="external-id",  # Optional
    aws_region="us-east-1"
)

3. IAM Role (EC2, ECS, Lambda)

# Automatically uses instance/container IAM role
client = Portkey(
    provider="bedrock",
    aws_region="us-east-1"
)

4. Environment Variables

export AWS_ACCESS_KEY_ID="AKIA***"
export AWS_SECRET_ACCESS_KEY="***"
export AWS_REGION="us-east-1"

client = Portkey(provider="bedrock")

Advanced Features

Streaming

stream = client.chat.completions.create(
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Function Calling (Converse API)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

Embeddings

response = client.embeddings.create(
    model="amazon.titan-embed-text-v2:0",
    input="AWS Bedrock provides access to foundation models"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Image Generation

response = client.images.generate(
    model="stability.sd3-large-v1:0",
    prompt="A serene mountain landscape at sunset",
    size="1024x1024"
)

image_url = response.data[0].url

Batch Inference

Create batch jobs for cost-effective inference:

# Create batch job
response = client.batches.create(
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",
    input_file_id="s3://my-bucket/input.jsonl",
    output_data_config={
        "s3OutputDataConfig": {
            "s3Uri": "s3://my-bucket/output/"
        }
    }
)

batch_id = response.id

# Check status
batch = client.batches.retrieve(batch_id)
print(f"Status: {batch.status}")

Cross-Region Inference

Use inference profiles for cross-region routing:

response = client.chat.completions.create(
    model="us.anthropic.claude-3-5-sonnet-20241022-v2:0",  # Inference profile
    messages=[{"role": "user", "content": "Hello"}]
)

Multi-Region Configuration

Load balance across AWS regions:

config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {
            "provider": "bedrock",
            "aws_access_key_id": "AKIA***",
            "aws_secret_access_key": "***",
            "aws_region": "us-east-1",
            "weight": 0.5
        },
        {
            "provider": "bedrock",
            "aws_access_key_id": "AKIA***",
            "aws_secret_access_key": "***",
            "aws_region": "us-west-2",
            "weight": 0.5
        }
    ]
}

client = Portkey().with_options(config=config)

Fallback Configuration

Fallback from Bedrock Claude to Anthropic:

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {
            "provider": "bedrock",
            "aws_access_key_id": "AKIA***",
            "aws_secret_access_key": "***",
            "aws_region": "us-east-1",
            "override_params": {"model": "anthropic.claude-3-5-sonnet-20241022-v2:0"}
        },
        {
            "provider": "anthropic",
            "api_key": "sk-ant-***",
            "override_params": {"model": "claude-3-5-sonnet-20241022"}
        }
    ]
}

client = Portkey().with_options(config=config)

Error Handling

from portkey_ai.exceptions import (
    RateLimitError,
    APIError,
    AuthenticationError
)

try:
    response = client.chat.completions.create(
        model="anthropic.claude-3-5-sonnet-20241022-v2:0",
        messages=[{"role": "user", "content": "Hello"}]
    )
except AuthenticationError as e:
    print(f"AWS credentials error: {e}")
except RateLimitError as e:
    print(f"Rate limit or quota exceeded: {e}")
except APIError as e:
    print(f"Bedrock API error: {e}")

Best Practices

Use IAM roles - More secure than access keys
Enable VPC endpoints - Private connectivity
Request model access - Models require explicit access approval
Use inference profiles - Better availability and routing
Monitor with CloudWatch - Track usage and costs
Set up guardrails - Content filtering and safety
Use batch inference - Cost-effective for large workloads
Implement retry logic - Handle throttling gracefully

Model Access

Before using models, request access in the AWS Console:

Go to AWS Bedrock Console
Navigate to Model access
Click Manage model access
Select models and request access
Wait for approval (usually instant)

Models are region-specific. Request access in each region you plan to use.

Regional Availability

Bedrock is available in multiple AWS regions:

US: us-east-1, us-west-2
Europe: eu-central-1, eu-west-1, eu-west-3
Asia Pacific: ap-southeast-1, ap-northeast-1, ap-south-1

Model availability varies by region. Check the AWS Bedrock documentation for details.

Pricing

Bedrock pricing includes:

On-demand: Pay per request/token
Provisioned throughput: Reserved capacity
Model customization: Additional costs for fine-tuning

AWS Bedrock Pricing

View detailed Bedrock pricing

Anthropic

Direct Anthropic integration

Load Balancing

Multi-region load balancing

Guardrails

Content filtering

Batch Processing

Batch inference guide

Overview

Major Providers

Specialized Providers

​Overview

​Supported Features

​Quick Start

​Basic Configuration

​Available Models

​Anthropic Claude

​Meta Llama

​Mistral AI

​Amazon Titan

​Cohere

​AI21 Labs

​Stability AI

​Authentication Methods

​1. Access Keys (Default)

​2. Assumed Role

​3. IAM Role (EC2, ECS, Lambda)

​4. Environment Variables

​Advanced Features

​Streaming

​Function Calling (Converse API)

​Embeddings

​Image Generation

​Batch Inference

​Cross-Region Inference

​Multi-Region Configuration

​Fallback Configuration

​Error Handling

​Best Practices

​Model Access

​Regional Availability

​Pricing

AWS Bedrock Pricing

​Related Resources

Anthropic

Load Balancing

Guardrails

Batch Processing

Build docs developers (and LLMs) love

Overview

Supported Features

Quick Start

Basic Configuration

Available Models

Anthropic Claude

Meta Llama

Mistral AI

Amazon Titan

Cohere

AI21 Labs

Stability AI

Authentication Methods

1. Access Keys (Default)

2. Assumed Role

3. IAM Role (EC2, ECS, Lambda)

4. Environment Variables

Advanced Features

Streaming

Function Calling (Converse API)

Embeddings

Image Generation

Batch Inference

Cross-Region Inference

Multi-Region Configuration

Fallback Configuration

Error Handling

Best Practices

Model Access

Regional Availability

Pricing

Related Resources