Overview
Fly.io deploys LiteLLM containers close to your users in 30+ regions worldwide, providing:
Low latency : Edge compute near users
Global distribution : Automatic multi-region deployment
Auto-scaling : Scale based on demand
Private networking : WireGuard-based secure networking
Quick Start
Install Fly CLI
# macOS
brew install flyctl
# Linux
curl -L https://fly.io/install.sh | sh
# Windows (PowerShell)
pwsh -Command "iwr https://fly.io/install.ps1 -useb | iex"
Clone LiteLLM
git clone https://github.com/BerriAI/litellm.git
cd litellm
Launch Application
flyctl launch --name litellm-proxy
Fly will:
Detect Dockerfile
Create fly.toml configuration
Prompt for region selection
Ask to create PostgreSQL database
Set Environment Variables
# Set master key
flyctl secrets set LITELLM_MASTER_KEY=sk-1234
# Set provider API keys
flyctl secrets set OPENAI_API_KEY=sk-proj-...
flyctl secrets set ANTHROPIC_API_KEY=sk-ant-...
Configuration File
Create fly.toml
Fly CLI generates a basic config, but customize for LiteLLM:
app = "litellm-proxy"
primary_region = "iad" # Washington, DC
[ build ]
dockerfile = "Dockerfile"
[ http_service ]
internal_port = 4000
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 2 # HA: always 2 instances
processes = [ "app" ]
[[ services ]]
protocol = "tcp"
internal_port = 4000
[[ services . ports ]]
port = 80
handlers = [ "http" ]
force_https = true
[[ services . ports ]]
port = 443
handlers = [ "tls" , "http" ]
[ services . concurrency ]
type = "connections"
hard_limit = 250
soft_limit = 200
[[ services . tcp_checks ]]
interval = "15s"
timeout = "10s"
grace_period = "30s"
[[ services . http_checks ]]
interval = 30000
timeout = 10000
grace_period = "40s"
method = "GET"
path = "/health/liveliness"
protocol = "http"
[ env ]
STORE_MODEL_IN_DB = "True"
PORT = "4000"
[[ vm ]]
memory = "2gb"
cpu_kind = "shared"
cpus = 2
Multi-Region Deployment
Deploy to multiple regions for global coverage:
app = "litellm-proxy"
primary_region = "iad"
# Deploy to multiple regions
[ regions ]
iad = {} # Washington, DC (primary)
lhr = {} # London
fra = {} # Frankfurt
sin = {} # Singapore
syd = {} # Sydney
[ http_service ]
internal_port = 4000
force_https = true
min_machines_running = 1 # Per region
auto_stop_machines = false # Keep running
Deploy:
flyctl deploy --region iad,lhr,fra,sin,syd
Fly.io routes requests to the nearest region automatically using Anycast DNS.
Database Setup
Fly PostgreSQL
Create Database Cluster
flyctl postgres create \
--name litellm-db \
--region iad \
--initial-cluster-size 2 \
--vm-size shared-cpu-2x \
--volume-size 10
Options:
Regions: Deploy in 2-3 regions for HA
VM size: shared-cpu-1x, shared-cpu-2x, performance-1x
Volume: 10GB minimum, scales as needed
Attach Database to App
flyctl postgres attach litellm-db --app litellm-proxy
This sets DATABASE_URL environment variable automatically.
Verify Connection
flyctl ssh console --app litellm-proxy
# Inside container
psql $DATABASE_URL
Database Replication
For multi-region HA:
# Create primary database
flyctl postgres create --name litellm-db-primary --region iad
# Add read replicas in other regions
flyctl postgres create \
--name litellm-db-replica-lhr \
--region lhr \
--fork-from litellm-db-primary
flyctl postgres create \
--name litellm-db-replica-sin \
--region sin \
--fork-from litellm-db-primary
Configure read replicas:
[ env ]
DATABASE_URL = "postgres://..." # Primary (writes)
DATABASE_REPLICA_URL = "postgres://..." # Read replica
Private Networking
Fly.io provides private IPv6 networking via WireGuard:
Connect Services
# Services communicate via internal DNS
litellm-proxy.internal # Your app
litellm-db.internal # PostgreSQL
litellm-redis.internal # Redis (if deployed)
Connection strings:
# PostgreSQL
DATABASE_URL = postgresql://user:[email protected] :5432/litellm
# Redis
REDIS_URL = redis://litellm-redis.internal:6379
Connect External Services
For managed services (AWS RDS, etc.), use Fly Proxy:
# Create proxy to external database
flyctl proxy postgres://user:[email protected] :5432/litellm
Configuration File Deployment
Method 1: Bake into Image
Add to Dockerfile:
FROM ghcr.io/berriai/litellm:main-stable
# Copy config into image
COPY config.yaml /app/config.yaml
ENTRYPOINT [ "litellm" ]
CMD [ "--config" , "/app/config.yaml" , "--port" , "4000" ]
Method 2: Fly Secrets
Store config as secret:
# Create config.yaml
cat > config.yaml << 'EOF'
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
EOF
# Store as secret (base64 encoded)
flyctl secrets set CONFIG_YAML="$( cat config.yaml | base64 )"
Update entrypoint to decode:
echo $CONFIG_YAML | base64 -d > /app/config.yaml
litellm --config /app/config.yaml --port 4000
Method 3: Fly Volumes
Volumes are region-specific and not replicated. Use for local data only.
# Create volume
flyctl volumes create litellm_data --region iad --size 10
# Mount in fly.toml
[[mounts]]
source = "litellm_data"
destination = "/data"
Secrets Management
Set Secrets
# Set individual secrets
flyctl secrets set LITELLM_MASTER_KEY=sk-1234
flyctl secrets set OPENAI_API_KEY=sk-proj-...
flyctl secrets set ANTHROPIC_API_KEY=sk-ant-...
# Set from file
flyctl secrets import < secrets.txt
secrets.txt:
LITELLM_MASTER_KEY = sk-1234
OPENAI_API_KEY = sk-proj-...
ANTHROPIC_API_KEY = sk-ant-...
AZURE_API_KEY = your-key
AZURE_API_BASE = https://your-resource.openai.azure.com
View Secrets
# List secret names (values hidden)
flyctl secrets list
# Unset secret
flyctl secrets unset OPENAI_API_KEY
Custom Domains
Add Certificate
flyctl certs create api.yourdomain.com
Get DNS Records
flyctl certs show api.yourdomain.com
Shows required DNS records (CNAME or A/AAAA).
Update DNS
Add to your DNS provider: Type: CNAME
Name: api
Value: litellm-proxy.fly.dev
TTL: Auto
Verify
flyctl certs check api.yourdomain.com
Fly automatically provisions SSL certificate.
Scaling
Autoscaling Configuration
[ http_service ]
internal_port = 4000
force_https = true
# Autoscaling
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 2
max_machines_running = 10
[ http_service . concurrency ]
type = "requests"
hard_limit = 100
soft_limit = 80
Manual Scaling
# Scale to specific count
flyctl scale count 5
# Scale by region
flyctl scale count iad= 2 lhr= 2 sin= 1
# Scale VM resources
flyctl scale vm shared-cpu-2x --memory 4096
# View current scale
flyctl scale show
Resource Presets
Preset CPUs Memory Use Case shared-cpu-1x1 256MB-2GB Development shared-cpu-2x2 512MB-4GB Small production shared-cpu-4x4 1GB-8GB Medium production performance-1x1 2GB-8GB Dedicated CPU performance-2x2 4GB-16GB High performance
Monitoring and Metrics
Fly Metrics
# View dashboard
flyctl dashboard
# Live monitoring
flyctl status
# Resource usage
flyctl metrics
# View logs
flyctl logs
# Follow logs
flyctl logs -f
Prometheus Integration
Expose metrics to Fly’s Prometheus:
[[ metrics ]]
port = 4000
path = "/metrics"
Access metrics:
External Monitoring
Add observability providers:
# Datadog
flyctl secrets set \
USE_DDTRACE= true \
DD_API_KEY=your-key \
DD_SITE=datadoghq.com
# Langfuse
flyctl secrets set \
LANGFUSE_PUBLIC_KEY=pk-... \
LANGFUSE_SECRET_KEY=sk-... \
LANGFUSE_HOST=https://cloud.langfuse.com
Deployment Strategies
Blue-Green Deployment
# Deploy new version without replacing old
flyctl deploy --strategy bluegreen
# Verify new version
curl https://litellm-proxy.fly.dev/health
# If successful, Fly automatically switches traffic
# If issues, rollback:
flyctl releases rollback
Canary Deployment
# Deploy to 10% of machines
flyctl deploy --strategy canary:10
# Monitor metrics
flyctl metrics
# Promote to 100%
flyctl deploy --strategy immediate
Rolling Deployment (Default)
# Replace machines one at a time
flyctl deploy --strategy rolling
High Availability Setup
Multi-Region with Load Balancing
app = "litellm-proxy"
primary_region = "iad"
[ http_service ]
internal_port = 4000
force_https = true
min_machines_running = 2 # Per region
# Deploy to multiple regions
[ services . regions ]
iad = { min_machines = 2 } # US East
lhr = { min_machines = 2 } # Europe
nrt = { min_machines = 1 } # Asia
[[ vm ]]
memory = "2gb"
cpu_kind = "performance"
cpus = 2
Deploy:
Health Checks
[[ services . http_checks ]]
interval = "30s"
timeout = "10s"
grace_period = "40s"
method = "GET"
path = "/health/liveliness"
protocol = "http"
# Headers
[ services . http_checks . headers ]
Authorization = "Bearer sk-1234"
[[ services . tcp_checks ]]
interval = "15s"
timeout = "10s"
grace_period = "30s"
Cost Optimization
Pricing Overview
Fly.io Pricing (Pay-as-you-go) :
Compute :
shared-cpu-1x : $0.0000008/sec ($2.07/month)
shared-cpu-2x : $0.0000016/sec ($4.15/month)
Memory :
256MB : $0.0000002/sec ($0.52/month)
Per GB : $0.0000008/sec ($2.07/month)
Bandwidth :
First 100GB : Free
Over 100GB : $0.02/GB
PostgreSQL :
shared-cpu-1x + 10GB : ~$5/month
performance-1x + 50GB : ~$30/month
Free Allowance
Free Tier (Hobby Plan) :
- 3 shared-cpu-1x VMs (256MB each)
- Up to 160GB storage
- 100GB outbound bandwidth
Good for : Development and testing
Optimization Tips
Use Autoscaling
Stop machines when idle: auto_stop_machines = true
auto_start_machines = true
Right-Size Resources
Start small, scale up based on metrics: flyctl scale vm shared-cpu-1x --memory 1024
Use Regional Routing
Deploy only in regions with actual traffic.
Optimize Images
Use multi-stage builds, minimize layers: FROM cgr.dev/chainguard/wolfi-base AS runtime
# Minimal runtime dependencies
Troubleshooting
Deployment Failures
# Check deployment status
flyctl status
# View logs
flyctl logs
# Common issues:
# 1. Health check failing
Error: Health checks failed
Solution: Increase grace_period or fix /health endpoint
# 2. Out of memory
Error: OOM Killed
Solution: Increase VM memory
flyctl scale vm shared-cpu-2x --memory 2048
# 3. Port binding error
Error: listen tcp :4000: bind: address already in use
Solution: Use PORT environment variable
Database Connection Issues
# Test database connectivity
flyctl ssh console
psql $DATABASE_URL
# Check database status
flyctl postgres db list --app litellm-db
# View database logs
flyctl logs --app litellm-db
SSH into Machine
# SSH into running machine
flyctl ssh console
# SSH into specific machine
flyctl ssh console -s < machine-i d >
# Run command
flyctl ssh console -C "ls -la /app"
Restart Machines
# Restart all machines
flyctl apps restart
# Restart specific machine
flyctl machine restart < machine-i d >
Security Best Practices
Network Security
# Force HTTPS
[ http_service ]
force_https = true
# Internal services only
[[ services ]]
internal_port = 4000
protocol = "tcp"
# No public ports
Secrets Rotation
# Update secrets without downtime
flyctl secrets set LITELLM_MASTER_KEY=new-key
# Fly automatically restarts machines with new secrets
Private Networking
# Keep sensitive services internal
# Use .internal DNS for service-to-service communication
DATABASE_URL = postgresql://user:[email protected] :5432/db
Next Steps
Monitoring Set up comprehensive observability
High Availability Multi-region HA deployment patterns
Security Harden your deployment
Performance Optimize for global traffic