Deploy to Fly.io

Overview

Fly.io deploys LiteLLM containers close to your users in 30+ regions worldwide, providing:

Low latency: Edge compute near users
Global distribution: Automatic multi-region deployment
Auto-scaling: Scale based on demand
Private networking: WireGuard-based secure networking

Quick Start

Install Fly CLI

# macOS
brew install flyctl

# Linux
curl -L https://fly.io/install.sh | sh

# Windows (PowerShell)
pwsh -Command "iwr https://fly.io/install.ps1 -useb | iex"

flyctl auth login

Clone LiteLLM

git clone https://github.com/BerriAI/litellm.git
cd litellm

Launch Application

flyctl launch --name litellm-proxy

Fly will:

Detect Dockerfile
Create fly.toml configuration
Prompt for region selection
Ask to create PostgreSQL database

Set Environment Variables

# Set master key
flyctl secrets set LITELLM_MASTER_KEY=sk-1234

# Set provider API keys
flyctl secrets set OPENAI_API_KEY=sk-proj-...
flyctl secrets set ANTHROPIC_API_KEY=sk-ant-...

Deploy

flyctl deploy

Configuration File

Create fly.toml

Fly CLI generates a basic config, but customize for LiteLLM:

fly.toml

app = "litellm-proxy"
primary_region = "iad"  # Washington, DC

[build]
  dockerfile = "Dockerfile"

[http_service]
  internal_port = 4000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 2  # HA: always 2 instances
  processes = ["app"]

[[services]]
  protocol = "tcp"
  internal_port = 4000

  [[services.ports]]
    port = 80
    handlers = ["http"]
    force_https = true

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

  [services.concurrency]
    type = "connections"
    hard_limit = 250
    soft_limit = 200

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "10s"
    grace_period = "30s"

  [[services.http_checks]]
    interval = 30000
    timeout = 10000
    grace_period = "40s"
    method = "GET"
    path = "/health/liveliness"
    protocol = "http"

[env]
  STORE_MODEL_IN_DB = "True"
  PORT = "4000"

[[vm]]
  memory = "2gb"
  cpu_kind = "shared"
  cpus = 2

Multi-Region Deployment

Deploy to multiple regions for global coverage:

app = "litellm-proxy"
primary_region = "iad"

# Deploy to multiple regions
[regions]
  iad = {}  # Washington, DC (primary)
  lhr = {}  # London
  fra = {}  # Frankfurt
  sin = {}  # Singapore
  syd = {}  # Sydney

[http_service]
  internal_port = 4000
  force_https = true
  min_machines_running = 1  # Per region
  auto_stop_machines = false  # Keep running

Deploy:

flyctl deploy --region iad,lhr,fra,sin,syd

Fly.io routes requests to the nearest region automatically using Anycast DNS.

Database Setup

Fly PostgreSQL

Create Database Cluster

flyctl postgres create \
  --name litellm-db \
  --region iad \
  --initial-cluster-size 2 \
  --vm-size shared-cpu-2x \
  --volume-size 10

Options:

Regions: Deploy in 2-3 regions for HA
VM size: shared-cpu-1x, shared-cpu-2x, performance-1x
Volume: 10GB minimum, scales as needed

Attach Database to App

flyctl postgres attach litellm-db --app litellm-proxy

This sets DATABASE_URL environment variable automatically.

Verify Connection

flyctl ssh console --app litellm-proxy

# Inside container
psql $DATABASE_URL

Database Replication

For multi-region HA:

# Create primary database
flyctl postgres create --name litellm-db-primary --region iad

# Add read replicas in other regions
flyctl postgres create \
  --name litellm-db-replica-lhr \
  --region lhr \
  --fork-from litellm-db-primary

flyctl postgres create \
  --name litellm-db-replica-sin \
  --region sin \
  --fork-from litellm-db-primary

Configure read replicas:

[env]
  DATABASE_URL = "postgres://..."  # Primary (writes)
  DATABASE_REPLICA_URL = "postgres://..."  # Read replica

Private Networking

Fly.io provides private IPv6 networking via WireGuard:

Connect Services

# Services communicate via internal DNS
litellm-proxy.internal       # Your app
litellm-db.internal           # PostgreSQL
litellm-redis.internal        # Redis (if deployed)

Connection strings:

# PostgreSQL
DATABASE_URL=postgresql://user:[email protected]:5432/litellm

# Redis
REDIS_URL=redis://litellm-redis.internal:6379

Connect External Services

For managed services (AWS RDS, etc.), use Fly Proxy:

# Create proxy to external database
flyctl proxy postgres://user:[email protected]:5432/litellm

Configuration File Deployment

Method 1: Bake into Image

Add to Dockerfile:

FROM ghcr.io/berriai/litellm:main-stable

# Copy config into image
COPY config.yaml /app/config.yaml

ENTRYPOINT ["litellm"]
CMD ["--config", "/app/config.yaml", "--port", "4000"]

Method 2: Fly Secrets

Store config as secret:

# Create config.yaml
cat > config.yaml << 'EOF'
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
EOF

# Store as secret (base64 encoded)
flyctl secrets set CONFIG_YAML="$(cat config.yaml | base64)"

Update entrypoint to decode:

echo $CONFIG_YAML | base64 -d > /app/config.yaml
litellm --config /app/config.yaml --port 4000

Method 3: Fly Volumes

Volumes are region-specific and not replicated. Use for local data only.

# Create volume
flyctl volumes create litellm_data --region iad --size 10

# Mount in fly.toml
[[mounts]]
  source = "litellm_data"
  destination = "/data"

Secrets Management

Set Secrets

# Set individual secrets
flyctl secrets set LITELLM_MASTER_KEY=sk-1234
flyctl secrets set OPENAI_API_KEY=sk-proj-...
flyctl secrets set ANTHROPIC_API_KEY=sk-ant-...

# Set from file
flyctl secrets import < secrets.txt

secrets.txt:

LITELLM_MASTER_KEY=sk-1234
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=your-key
AZURE_API_BASE=https://your-resource.openai.azure.com

View Secrets

# List secret names (values hidden)
flyctl secrets list

# Unset secret
flyctl secrets unset OPENAI_API_KEY

Custom Domains

Add Certificate

flyctl certs create api.yourdomain.com

Get DNS Records

flyctl certs show api.yourdomain.com

Shows required DNS records (CNAME or A/AAAA).

Update DNS

Add to your DNS provider:

Type:  CNAME
Name:  api
Value: litellm-proxy.fly.dev
TTL:   Auto

Verify

flyctl certs check api.yourdomain.com

Fly automatically provisions SSL certificate.

Scaling

Autoscaling Configuration

fly.toml

[http_service]
  internal_port = 4000
  force_https = true
  
  # Autoscaling
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 2
  max_machines_running = 10

[http_service.concurrency]
  type = "requests"
  hard_limit = 100
  soft_limit = 80

Manual Scaling

# Scale to specific count
flyctl scale count 5

# Scale by region
flyctl scale count iad=2 lhr=2 sin=1

# Scale VM resources
flyctl scale vm shared-cpu-2x --memory 4096

# View current scale
flyctl scale show

Resource Presets

Preset	CPUs	Memory	Use Case
`shared-cpu-1x`	1	256MB-2GB	Development
`shared-cpu-2x`	2	512MB-4GB	Small production
`shared-cpu-4x`	4	1GB-8GB	Medium production
`performance-1x`	1	2GB-8GB	Dedicated CPU
`performance-2x`	2	4GB-16GB	High performance

Monitoring and Metrics

Fly Metrics

# View dashboard
flyctl dashboard

# Live monitoring
flyctl status

# Resource usage
flyctl metrics

# View logs
flyctl logs

# Follow logs
flyctl logs -f

Prometheus Integration

Expose metrics to Fly’s Prometheus:

fly.toml

[[metrics]]
  port = 4000
  path = "/metrics"

Access metrics:

flyctl prometheus

External Monitoring

Add observability providers:

# Datadog
flyctl secrets set \
  USE_DDTRACE=true \
  DD_API_KEY=your-key \
  DD_SITE=datadoghq.com

# Langfuse
flyctl secrets set \
  LANGFUSE_PUBLIC_KEY=pk-... \
  LANGFUSE_SECRET_KEY=sk-... \
  LANGFUSE_HOST=https://cloud.langfuse.com

Deployment Strategies

Blue-Green Deployment

# Deploy new version without replacing old
flyctl deploy --strategy bluegreen

# Verify new version
curl https://litellm-proxy.fly.dev/health

# If successful, Fly automatically switches traffic
# If issues, rollback:
flyctl releases rollback

Canary Deployment

# Deploy to 10% of machines
flyctl deploy --strategy canary:10

# Monitor metrics
flyctl metrics

# Promote to 100%
flyctl deploy --strategy immediate

Rolling Deployment (Default)

# Replace machines one at a time
flyctl deploy --strategy rolling

High Availability Setup

Multi-Region with Load Balancing

fly.toml

app = "litellm-proxy"
primary_region = "iad"

[http_service]
  internal_port = 4000
  force_https = true
  min_machines_running = 2  # Per region

# Deploy to multiple regions
[services.regions]
  iad = { min_machines = 2 }  # US East
  lhr = { min_machines = 2 }  # Europe
  nrt = { min_machines = 1 }  # Asia

[[vm]]
  memory = "2gb"
  cpu_kind = "performance"
  cpus = 2

Deploy:

flyctl deploy --ha

Health Checks

[[services.http_checks]]
  interval = "30s"
  timeout = "10s"
  grace_period = "40s"
  method = "GET"
  path = "/health/liveliness"
  protocol = "http"
  
  # Headers
  [services.http_checks.headers]
    Authorization = "Bearer sk-1234"

[[services.tcp_checks]]
  interval = "15s"
  timeout = "10s"
  grace_period = "30s"

Cost Optimization

Pricing Overview

Fly.io Pricing (Pay-as-you-go):
  Compute:
    shared-cpu-1x: $0.0000008/sec ($2.07/month)
    shared-cpu-2x: $0.0000016/sec ($4.15/month)
  
  Memory:
    256MB: $0.0000002/sec ($0.52/month)
    Per GB: $0.0000008/sec ($2.07/month)
  
  Bandwidth:
    First 100GB: Free
    Over 100GB: $0.02/GB
  
  PostgreSQL:
    shared-cpu-1x + 10GB: ~$5/month
    performance-1x + 50GB: ~$30/month

Free Allowance

Free Tier (Hobby Plan):
  - 3 shared-cpu-1x VMs (256MB each)
  - Up to 160GB storage
  - 100GB outbound bandwidth
  
Good for: Development and testing

Optimization Tips

Use Autoscaling

Stop machines when idle:

auto_stop_machines = true
auto_start_machines = true

Right-Size Resources

Start small, scale up based on metrics:

flyctl scale vm shared-cpu-1x --memory 1024

Use Regional Routing

Deploy only in regions with actual traffic.

Optimize Images

Use multi-stage builds, minimize layers:

FROM cgr.dev/chainguard/wolfi-base AS runtime
# Minimal runtime dependencies

Troubleshooting

Deployment Failures

# Check deployment status
flyctl status

# View logs
flyctl logs

# Common issues:

# 1. Health check failing
Error: Health checks failed
Solution: Increase grace_period or fix /health endpoint

# 2. Out of memory
Error: OOM Killed
Solution: Increase VM memory
  flyctl scale vm shared-cpu-2x --memory 2048

# 3. Port binding error
Error: listen tcp :4000: bind: address already in use
Solution: Use PORT environment variable

Database Connection Issues

# Test database connectivity
flyctl ssh console
psql $DATABASE_URL

# Check database status
flyctl postgres db list --app litellm-db

# View database logs
flyctl logs --app litellm-db

SSH into Machine

# SSH into running machine
flyctl ssh console

# SSH into specific machine
flyctl ssh console -s <machine-id>

# Run command
flyctl ssh console -C "ls -la /app"

Restart Machines

# Restart all machines
flyctl apps restart

# Restart specific machine
flyctl machine restart <machine-id>

Security Best Practices

Network Security

fly.toml

# Force HTTPS
[http_service]
  force_https = true

# Internal services only
[[services]]
  internal_port = 4000
  protocol = "tcp"
  
  # No public ports

Secrets Rotation

# Update secrets without downtime
flyctl secrets set LITELLM_MASTER_KEY=new-key

# Fly automatically restarts machines with new secrets

Private Networking

# Keep sensitive services internal
# Use .internal DNS for service-to-service communication
DATABASE_URL=postgresql://user:[email protected]:5432/db

Next Steps

Monitoring

Set up comprehensive observability

High Availability

Multi-region HA deployment patterns

Security

Harden your deployment

Performance

Optimize for global traffic

Deploy

Production

​Overview

​Quick Start

​Configuration File

​Create fly.toml

​Multi-Region Deployment

​Database Setup

​Fly PostgreSQL

​Database Replication

​Private Networking

​Connect Services

​Connect External Services

​Configuration File Deployment

​Method 1: Bake into Image

​Method 2: Fly Secrets

​Method 3: Fly Volumes

​Secrets Management

​Set Secrets

​View Secrets

​Custom Domains

​Scaling

​Autoscaling Configuration

​Manual Scaling

​Resource Presets

​Monitoring and Metrics

​Fly Metrics

​Prometheus Integration

​External Monitoring

​Deployment Strategies

​Blue-Green Deployment

​Canary Deployment

​Rolling Deployment (Default)

​High Availability Setup

​Multi-Region with Load Balancing

​Health Checks

​Cost Optimization

​Pricing Overview

​Free Allowance

​Optimization Tips

​Troubleshooting

​Deployment Failures

​Database Connection Issues

​SSH into Machine

​Restart Machines

​Security Best Practices

​Network Security

​Secrets Rotation

​Private Networking

​Next Steps

Monitoring

High Availability

Security

Performance

Build docs developers (and LLMs) love

Overview

Quick Start

Configuration File

Create fly.toml

Multi-Region Deployment

Database Setup

Fly PostgreSQL

Database Replication

Private Networking

Connect Services

Connect External Services

Configuration File Deployment

Method 1: Bake into Image

Method 2: Fly Secrets

Method 3: Fly Volumes

Secrets Management

Set Secrets

View Secrets

Custom Domains

Scaling

Autoscaling Configuration

Manual Scaling

Resource Presets

Monitoring and Metrics

Fly Metrics

Prometheus Integration

External Monitoring

Deployment Strategies

Blue-Green Deployment

Canary Deployment

Rolling Deployment (Default)

High Availability Setup

Multi-Region with Load Balancing

Health Checks

Cost Optimization

Pricing Overview

Free Allowance

Optimization Tips

Troubleshooting

Deployment Failures

Database Connection Issues

SSH into Machine

Restart Machines

Security Best Practices

Network Security

Secrets Rotation

Private Networking

Next Steps