Skip to main content

Overview

Fly.io deploys LiteLLM containers close to your users in 30+ regions worldwide, providing:
  • Low latency: Edge compute near users
  • Global distribution: Automatic multi-region deployment
  • Auto-scaling: Scale based on demand
  • Private networking: WireGuard-based secure networking

Quick Start

1

Install Fly CLI

# macOS
brew install flyctl

# Linux
curl -L https://fly.io/install.sh | sh

# Windows (PowerShell)
pwsh -Command "iwr https://fly.io/install.ps1 -useb | iex"
2

Login to Fly

flyctl auth login
3

Clone LiteLLM

git clone https://github.com/BerriAI/litellm.git
cd litellm
4

Launch Application

flyctl launch --name litellm-proxy
Fly will:
  • Detect Dockerfile
  • Create fly.toml configuration
  • Prompt for region selection
  • Ask to create PostgreSQL database
5

Set Environment Variables

# Set master key
flyctl secrets set LITELLM_MASTER_KEY=sk-1234

# Set provider API keys
flyctl secrets set OPENAI_API_KEY=sk-proj-...
flyctl secrets set ANTHROPIC_API_KEY=sk-ant-...
6

Deploy

flyctl deploy

Configuration File

Create fly.toml

Fly CLI generates a basic config, but customize for LiteLLM:
fly.toml
app = "litellm-proxy"
primary_region = "iad"  # Washington, DC

[build]
  dockerfile = "Dockerfile"

[http_service]
  internal_port = 4000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 2  # HA: always 2 instances
  processes = ["app"]

[[services]]
  protocol = "tcp"
  internal_port = 4000

  [[services.ports]]
    port = 80
    handlers = ["http"]
    force_https = true

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

  [services.concurrency]
    type = "connections"
    hard_limit = 250
    soft_limit = 200

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "10s"
    grace_period = "30s"

  [[services.http_checks]]
    interval = 30000
    timeout = 10000
    grace_period = "40s"
    method = "GET"
    path = "/health/liveliness"
    protocol = "http"

[env]
  STORE_MODEL_IN_DB = "True"
  PORT = "4000"

[[vm]]
  memory = "2gb"
  cpu_kind = "shared"
  cpus = 2

Multi-Region Deployment

Deploy to multiple regions for global coverage:
app = "litellm-proxy"
primary_region = "iad"

# Deploy to multiple regions
[regions]
  iad = {}  # Washington, DC (primary)
  lhr = {}  # London
  fra = {}  # Frankfurt
  sin = {}  # Singapore
  syd = {}  # Sydney

[http_service]
  internal_port = 4000
  force_https = true
  min_machines_running = 1  # Per region
  auto_stop_machines = false  # Keep running
Deploy:
flyctl deploy --region iad,lhr,fra,sin,syd
Fly.io routes requests to the nearest region automatically using Anycast DNS.

Database Setup

Fly PostgreSQL

1

Create Database Cluster

flyctl postgres create \
  --name litellm-db \
  --region iad \
  --initial-cluster-size 2 \
  --vm-size shared-cpu-2x \
  --volume-size 10
Options:
  • Regions: Deploy in 2-3 regions for HA
  • VM size: shared-cpu-1x, shared-cpu-2x, performance-1x
  • Volume: 10GB minimum, scales as needed
2

Attach Database to App

flyctl postgres attach litellm-db --app litellm-proxy
This sets DATABASE_URL environment variable automatically.
3

Verify Connection

flyctl ssh console --app litellm-proxy

# Inside container
psql $DATABASE_URL

Database Replication

For multi-region HA:
# Create primary database
flyctl postgres create --name litellm-db-primary --region iad

# Add read replicas in other regions
flyctl postgres create \
  --name litellm-db-replica-lhr \
  --region lhr \
  --fork-from litellm-db-primary

flyctl postgres create \
  --name litellm-db-replica-sin \
  --region sin \
  --fork-from litellm-db-primary
Configure read replicas:
[env]
  DATABASE_URL = "postgres://..."  # Primary (writes)
  DATABASE_REPLICA_URL = "postgres://..."  # Read replica

Private Networking

Fly.io provides private IPv6 networking via WireGuard:

Connect Services

# Services communicate via internal DNS
litellm-proxy.internal       # Your app
litellm-db.internal           # PostgreSQL
litellm-redis.internal        # Redis (if deployed)
Connection strings:
# PostgreSQL
DATABASE_URL=postgresql://user:[email protected]:5432/litellm

# Redis
REDIS_URL=redis://litellm-redis.internal:6379

Connect External Services

For managed services (AWS RDS, etc.), use Fly Proxy:
# Create proxy to external database
flyctl proxy postgres://user:[email protected]:5432/litellm

Configuration File Deployment

Method 1: Bake into Image

Add to Dockerfile:
FROM ghcr.io/berriai/litellm:main-stable

# Copy config into image
COPY config.yaml /app/config.yaml

ENTRYPOINT ["litellm"]
CMD ["--config", "/app/config.yaml", "--port", "4000"]

Method 2: Fly Secrets

Store config as secret:
# Create config.yaml
cat > config.yaml << 'EOF'
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
EOF

# Store as secret (base64 encoded)
flyctl secrets set CONFIG_YAML="$(cat config.yaml | base64)"
Update entrypoint to decode:
echo $CONFIG_YAML | base64 -d > /app/config.yaml
litellm --config /app/config.yaml --port 4000

Method 3: Fly Volumes

Volumes are region-specific and not replicated. Use for local data only.
# Create volume
flyctl volumes create litellm_data --region iad --size 10

# Mount in fly.toml
[[mounts]]
  source = "litellm_data"
  destination = "/data"

Secrets Management

Set Secrets

# Set individual secrets
flyctl secrets set LITELLM_MASTER_KEY=sk-1234
flyctl secrets set OPENAI_API_KEY=sk-proj-...
flyctl secrets set ANTHROPIC_API_KEY=sk-ant-...

# Set from file
flyctl secrets import < secrets.txt
secrets.txt:
LITELLM_MASTER_KEY=sk-1234
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=your-key
AZURE_API_BASE=https://your-resource.openai.azure.com

View Secrets

# List secret names (values hidden)
flyctl secrets list

# Unset secret
flyctl secrets unset OPENAI_API_KEY

Custom Domains

1

Add Certificate

flyctl certs create api.yourdomain.com
2

Get DNS Records

flyctl certs show api.yourdomain.com
Shows required DNS records (CNAME or A/AAAA).
3

Update DNS

Add to your DNS provider:
Type:  CNAME
Name:  api
Value: litellm-proxy.fly.dev
TTL:   Auto
4

Verify

flyctl certs check api.yourdomain.com
Fly automatically provisions SSL certificate.

Scaling

Autoscaling Configuration

fly.toml
[http_service]
  internal_port = 4000
  force_https = true
  
  # Autoscaling
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 2
  max_machines_running = 10

[http_service.concurrency]
  type = "requests"
  hard_limit = 100
  soft_limit = 80

Manual Scaling

# Scale to specific count
flyctl scale count 5

# Scale by region
flyctl scale count iad=2 lhr=2 sin=1

# Scale VM resources
flyctl scale vm shared-cpu-2x --memory 4096

# View current scale
flyctl scale show

Resource Presets

PresetCPUsMemoryUse Case
shared-cpu-1x1256MB-2GBDevelopment
shared-cpu-2x2512MB-4GBSmall production
shared-cpu-4x41GB-8GBMedium production
performance-1x12GB-8GBDedicated CPU
performance-2x24GB-16GBHigh performance

Monitoring and Metrics

Fly Metrics

# View dashboard
flyctl dashboard

# Live monitoring
flyctl status

# Resource usage
flyctl metrics

# View logs
flyctl logs

# Follow logs
flyctl logs -f

Prometheus Integration

Expose metrics to Fly’s Prometheus:
fly.toml
[[metrics]]
  port = 4000
  path = "/metrics"
Access metrics:
flyctl prometheus

External Monitoring

Add observability providers:
# Datadog
flyctl secrets set \
  USE_DDTRACE=true \
  DD_API_KEY=your-key \
  DD_SITE=datadoghq.com

# Langfuse
flyctl secrets set \
  LANGFUSE_PUBLIC_KEY=pk-... \
  LANGFUSE_SECRET_KEY=sk-... \
  LANGFUSE_HOST=https://cloud.langfuse.com

Deployment Strategies

Blue-Green Deployment

# Deploy new version without replacing old
flyctl deploy --strategy bluegreen

# Verify new version
curl https://litellm-proxy.fly.dev/health

# If successful, Fly automatically switches traffic
# If issues, rollback:
flyctl releases rollback

Canary Deployment

# Deploy to 10% of machines
flyctl deploy --strategy canary:10

# Monitor metrics
flyctl metrics

# Promote to 100%
flyctl deploy --strategy immediate

Rolling Deployment (Default)

# Replace machines one at a time
flyctl deploy --strategy rolling

High Availability Setup

Multi-Region with Load Balancing

fly.toml
app = "litellm-proxy"
primary_region = "iad"

[http_service]
  internal_port = 4000
  force_https = true
  min_machines_running = 2  # Per region

# Deploy to multiple regions
[services.regions]
  iad = { min_machines = 2 }  # US East
  lhr = { min_machines = 2 }  # Europe
  nrt = { min_machines = 1 }  # Asia

[[vm]]
  memory = "2gb"
  cpu_kind = "performance"
  cpus = 2
Deploy:
flyctl deploy --ha

Health Checks

[[services.http_checks]]
  interval = "30s"
  timeout = "10s"
  grace_period = "40s"
  method = "GET"
  path = "/health/liveliness"
  protocol = "http"
  
  # Headers
  [services.http_checks.headers]
    Authorization = "Bearer sk-1234"

[[services.tcp_checks]]
  interval = "15s"
  timeout = "10s"
  grace_period = "30s"

Cost Optimization

Pricing Overview

Fly.io Pricing (Pay-as-you-go):
  Compute:
    shared-cpu-1x: $0.0000008/sec ($2.07/month)
    shared-cpu-2x: $0.0000016/sec ($4.15/month)
  
  Memory:
    256MB: $0.0000002/sec ($0.52/month)
    Per GB: $0.0000008/sec ($2.07/month)
  
  Bandwidth:
    First 100GB: Free
    Over 100GB: $0.02/GB
  
  PostgreSQL:
    shared-cpu-1x + 10GB: ~$5/month
    performance-1x + 50GB: ~$30/month

Free Allowance

Free Tier (Hobby Plan):
  - 3 shared-cpu-1x VMs (256MB each)
  - Up to 160GB storage
  - 100GB outbound bandwidth
  
Good for: Development and testing

Optimization Tips

1

Use Autoscaling

Stop machines when idle:
auto_stop_machines = true
auto_start_machines = true
2

Right-Size Resources

Start small, scale up based on metrics:
flyctl scale vm shared-cpu-1x --memory 1024
3

Use Regional Routing

Deploy only in regions with actual traffic.
4

Optimize Images

Use multi-stage builds, minimize layers:
FROM cgr.dev/chainguard/wolfi-base AS runtime
# Minimal runtime dependencies

Troubleshooting

Deployment Failures

# Check deployment status
flyctl status

# View logs
flyctl logs

# Common issues:

# 1. Health check failing
Error: Health checks failed
Solution: Increase grace_period or fix /health endpoint

# 2. Out of memory
Error: OOM Killed
Solution: Increase VM memory
  flyctl scale vm shared-cpu-2x --memory 2048

# 3. Port binding error
Error: listen tcp :4000: bind: address already in use
Solution: Use PORT environment variable

Database Connection Issues

# Test database connectivity
flyctl ssh console
psql $DATABASE_URL

# Check database status
flyctl postgres db list --app litellm-db

# View database logs
flyctl logs --app litellm-db

SSH into Machine

# SSH into running machine
flyctl ssh console

# SSH into specific machine
flyctl ssh console -s <machine-id>

# Run command
flyctl ssh console -C "ls -la /app"

Restart Machines

# Restart all machines
flyctl apps restart

# Restart specific machine
flyctl machine restart <machine-id>

Security Best Practices

Network Security

fly.toml
# Force HTTPS
[http_service]
  force_https = true

# Internal services only
[[services]]
  internal_port = 4000
  protocol = "tcp"
  
  # No public ports

Secrets Rotation

# Update secrets without downtime
flyctl secrets set LITELLM_MASTER_KEY=new-key

# Fly automatically restarts machines with new secrets

Private Networking

# Keep sensitive services internal
# Use .internal DNS for service-to-service communication
DATABASE_URL=postgresql://user:[email protected]:5432/db

Next Steps

Monitoring

Set up comprehensive observability

High Availability

Multi-region HA deployment patterns

Security

Harden your deployment

Performance

Optimize for global traffic

Build docs developers (and LLMs) love