Overview
LLM Gateway Core uses Docker Compose to orchestrate a complete deployment stack including the gateway API, Redis cache, monitoring tools, and frontend interface. This guide covers the full deployment process from building images to running the production stack.
Prerequisites
Before deploying, ensure you have:
Docker Engine 20.10 or later
Docker Compose v2.0 or later
At least 2GB of available RAM
Port availability: 8000, 8501, 6380, 9090, 3000
For local model support via Ollama, you need Ollama running on your host machine at http://localhost:11434
Quick Start
Clone and Configure
Create a .env file in the project root with your configuration: cp .env.example .env
# Edit .env with your settings
See Environment Variables for all available options.
Build and Start Services
Deploy the entire stack with a single command: docker-compose up -d --build
This will start all services in detached mode and build the gateway image.
Verify Deployment
Check that all services are running: Access the health endpoint: curl http://localhost:8000/api/v1/health
Service Architecture
The Docker Compose stack includes five interconnected services:
Gateway Service
The core FastAPI application that handles LLM requests and routing.
gateway :
build : .
ports :
- "8000:8000"
environment :
- PORT=8000
- REDIS_URL=redis://redis:6379/0
- OLLAMA_BASE_URL=http://host.docker.internal:11434
env_file :
- .env
depends_on :
- redis
extra_hosts :
- "host.docker.internal:host-gateway"
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:8000/api/v1/health" ]
interval : 10s
timeout : 5s
retries : 3
Maps container port 8000 to host port 8000 for API access
Enables the container to access services on the host machine (required for Ollama)
Monitors gateway availability every 10 seconds via the health endpoint
Redis Service
Provides distributed caching and rate limiting functionality.
redis :
image : redis:alpine
container_name : llm-gateway-redis
ports :
- "6380:6379"
volumes :
- redis_data:/data
healthcheck :
test : [ "CMD" , "redis-cli" , "ping" ]
interval : 5s
timeout : 3s
retries : 5
Exposes Redis on host port 6380 (avoids conflicts with local Redis on 6379)
Persists Redis data across container restarts
Prometheus Service
Collects and stores metrics from the gateway.
prometheus :
image : prom/prometheus:latest
ports :
- "9090:9090"
volumes :
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command :
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
Mounts Prometheus configuration from the project directory
Grafana Service
Provides visualization dashboards for monitoring.
grafana :
image : grafana/grafana:latest
ports :
- "3000:3000"
volumes :
- ./grafana/provisioning:/etc/grafana/provisioning
environment :
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
The default Grafana configuration allows anonymous admin access. Change GF_SECURITY_ADMIN_PASSWORD and disable anonymous access in production.
Frontend Service
Streamlit-based web interface for testing the gateway.
frontend :
build : .
ports :
- "8501:8501"
command : [ "streamlit" , "run" , "frontend/main.py" , "--server.port=8501" , "--server.address=0.0.0.0" ]
depends_on :
- gateway
environment :
- PYTHONPATH=/app
Docker Image
The gateway uses a multi-stage Dockerfile optimized for production:
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
WORKDIR /app
ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --frozen --no-install-project --no-dev
ENV PATH= "/app/.venv/bin:$PATH"
WORKDIR /app
COPY . .
ENV PYTHONPATH=/app
CMD [ "uvicorn" , "app.main:app" , "--host" , "0.0.0.0" , "--port" , "8000" ]
Image Features
Base Image : Uses uv-enabled Python 3.13 slim image for fast dependency installation
Build Cache : Leverages Docker cache mounts for faster rebuilds
Bytecode Compilation : Precompiles Python for improved startup performance
Production Server : Runs with Uvicorn ASGI server for high concurrency
Volume Management
The stack creates a named volume for Redis persistence:
volumes :
redis_data :
name : llm-gateway-redis-data
This ensures cached responses and rate limit counters survive container restarts.
Backing Up Redis Data
# Create backup
docker run --rm -v llm-gateway-redis-data:/data -v $( pwd ) :/backup \
alpine tar czf /backup/redis-backup.tar.gz -C /data .
# Restore backup
docker run --rm -v llm-gateway-redis-data:/data -v $( pwd ) :/backup \
alpine tar xzf /backup/redis-backup.tar.gz -C /data
Service Access
After deployment, access the services at:
Gateway API
Frontend UI
Grafana Dashboard
Prometheus
curl http://localhost:8000/api/v1/health
Production Considerations
Security Hardening
The default configuration is optimized for development. Apply these changes for production:
Change Default Credentials : Update Grafana admin password
Secure Redis : Add Redis password authentication
API Key Management : Use strong, randomly generated API keys
Disable Anonymous Access : Remove anonymous Grafana access
Network Isolation : Use internal Docker networks for service communication
Resource Limits
Add resource constraints to prevent resource exhaustion:
gateway :
deploy :
resources :
limits :
cpus : '1'
memory : 1G
reservations :
cpus : '0.5'
memory : 512M
TLS/SSL Configuration
For production deployments, add a reverse proxy (nginx/Caddy) with TLS:
nginx :
image : nginx:alpine
ports :
- "443:443"
- "80:80"
volumes :
- ./nginx.conf:/etc/nginx/nginx.conf
- ./certs:/etc/nginx/certs
depends_on :
- gateway
Scaling Considerations
To scale the gateway horizontally:
# Scale gateway to 3 instances
docker-compose up -d --scale gateway= 3
# Add load balancer for distribution
Ensure Redis is properly configured for distributed rate limiting across instances.
Troubleshooting
View Service Logs
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f gateway
# Last 100 lines
docker-compose logs --tail=100 gateway
Restart Services
# Restart specific service
docker-compose restart gateway
# Restart all services
docker-compose restart
Health Check Status
# Check container health
docker-compose ps
# Inspect health check details
docker inspect llm-gateway-redis | jq '.[0].State.Health'
Common Issues
Gateway can’t connect to Ollama
Verify Ollama is running: curl http://localhost:11434/api/tags
Check extra_hosts configuration includes host.docker.internal
On Linux, use --network=host or configure proper bridge networking
Redis connection refused
Ensure Redis container is healthy: docker-compose ps redis
Verify REDIS_URL uses correct hostname (redis not localhost)
Check Redis logs: docker-compose logs redis
Port conflicts
Check if ports are already in use: netstat -tuln | grep LISTEN
Modify port mappings in docker-compose.yml if needed
Maintenance
Updating the Stack
# Pull latest changes
git pull
# Rebuild and restart
docker-compose up -d --build
# Remove old images
docker image prune -f
Clearing Cache
# Clear Redis cache
docker-compose exec redis redis-cli FLUSHALL
# Or restart Redis
docker-compose restart redis
Complete Cleanup
# Stop and remove all containers
docker-compose down
# Remove volumes (WARNING: deletes all data)
docker-compose down -v
# Remove images
docker-compose down --rmi all