Docker Deployment

Overview

LLM Gateway Core uses Docker Compose to orchestrate a complete deployment stack including the gateway API, Redis cache, monitoring tools, and frontend interface. This guide covers the full deployment process from building images to running the production stack.

Prerequisites

Before deploying, ensure you have:

Docker Engine 20.10 or later
Docker Compose v2.0 or later
At least 2GB of available RAM
Port availability: 8000, 8501, 6380, 9090, 3000

For local model support via Ollama, you need Ollama running on your host machine at http://localhost:11434

Quick Start

Clone and Configure

Create a .env file in the project root with your configuration:

cp .env.example .env
# Edit .env with your settings

See Environment Variables for all available options.

Build and Start Services

Deploy the entire stack with a single command:

docker-compose up -d --build

This will start all services in detached mode and build the gateway image.

Verify Deployment

Check that all services are running:

docker-compose ps

Access the health endpoint:

curl http://localhost:8000/api/v1/health

Service Architecture

The Docker Compose stack includes five interconnected services:

Gateway Service

The core FastAPI application that handles LLM requests and routing.

gateway:
  build: .
  ports:
    - "8000:8000"
  environment:
    - PORT=8000
    - REDIS_URL=redis://redis:6379/0
    - OLLAMA_BASE_URL=http://host.docker.internal:11434
  env_file:
    - .env
  depends_on:
    - redis
  extra_hosts:
    - "host.docker.internal:host-gateway"
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
    interval: 10s
    timeout: 5s
    retries: 3

ports

array

Maps container port 8000 to host port 8000 for API access

extra_hosts

array

Enables the container to access services on the host machine (required for Ollama)

healthcheck

object

Monitors gateway availability every 10 seconds via the health endpoint

Redis Service

Provides distributed caching and rate limiting functionality.

redis:
  image: redis:alpine
  container_name: llm-gateway-redis
  ports:
    - "6380:6379"
  volumes:
    - redis_data:/data
  healthcheck:
    test: ["CMD", "redis-cli", "ping"]
    interval: 5s
    timeout: 3s
    retries: 5

ports

array

Exposes Redis on host port 6380 (avoids conflicts with local Redis on 6379)

volumes

array

Persists Redis data across container restarts

Prometheus Service

Collects and stores metrics from the gateway.

prometheus:
  image: prom/prometheus:latest
  ports:
    - "9090:9090"
  volumes:
    - ./prometheus.yml:/etc/prometheus/prometheus.yml
  command:
    - '--config.file=/etc/prometheus/prometheus.yml'
    - '--storage.tsdb.path=/prometheus'

ports

array

Web UI and API available at http://localhost:9090

volumes

array

Mounts Prometheus configuration from the project directory

Grafana Service

Provides visualization dashboards for monitoring.

grafana:
  image: grafana/grafana:latest
  ports:
    - "3000:3000"
  volumes:
    - ./grafana/provisioning:/etc/grafana/provisioning
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

The default Grafana configuration allows anonymous admin access. Change GF_SECURITY_ADMIN_PASSWORD and disable anonymous access in production.

Frontend Service

Streamlit-based web interface for testing the gateway.

frontend:
  build: .
  ports:
    - "8501:8501"
  command: ["streamlit", "run", "frontend/main.py", "--server.port=8501", "--server.address=0.0.0.0"]
  depends_on:
    - gateway
  environment:
    - PYTHONPATH=/app

Docker Image

The gateway uses a multi-stage Dockerfile optimized for production:

FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim

WORKDIR /app

ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy

RUN --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev

ENV PATH="/app/.venv/bin:$PATH"

WORKDIR /app
COPY . .

ENV PYTHONPATH=/app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Image Features

Base Image: Uses uv-enabled Python 3.13 slim image for fast dependency installation
Build Cache: Leverages Docker cache mounts for faster rebuilds
Bytecode Compilation: Precompiles Python for improved startup performance
Production Server: Runs with Uvicorn ASGI server for high concurrency

Volume Management

The stack creates a named volume for Redis persistence:

volumes:
  redis_data:
    name: llm-gateway-redis-data

This ensures cached responses and rate limit counters survive container restarts.

Backing Up Redis Data

# Create backup
docker run --rm -v llm-gateway-redis-data:/data -v $(pwd):/backup \
  alpine tar czf /backup/redis-backup.tar.gz -C /data .

# Restore backup
docker run --rm -v llm-gateway-redis-data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/redis-backup.tar.gz -C /data

Service Access

After deployment, access the services at:

curl http://localhost:8000/api/v1/health

Production Considerations

Security Hardening

The default configuration is optimized for development. Apply these changes for production:

Change Default Credentials: Update Grafana admin password
Secure Redis: Add Redis password authentication
API Key Management: Use strong, randomly generated API keys
Disable Anonymous Access: Remove anonymous Grafana access
Network Isolation: Use internal Docker networks for service communication

Resource Limits

Add resource constraints to prevent resource exhaustion:

gateway:
  deploy:
    resources:
      limits:
        cpus: '1'
        memory: 1G
      reservations:
        cpus: '0.5'
        memory: 512M

TLS/SSL Configuration

For production deployments, add a reverse proxy (nginx/Caddy) with TLS:

nginx:
  image: nginx:alpine
  ports:
    - "443:443"
    - "80:80"
  volumes:
    - ./nginx.conf:/etc/nginx/nginx.conf
    - ./certs:/etc/nginx/certs
  depends_on:
    - gateway

Scaling Considerations

To scale the gateway horizontally:

# Scale gateway to 3 instances
docker-compose up -d --scale gateway=3

# Add load balancer for distribution

Ensure Redis is properly configured for distributed rate limiting across instances.

Troubleshooting

View Service Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f gateway

# Last 100 lines
docker-compose logs --tail=100 gateway

Restart Services

# Restart specific service
docker-compose restart gateway

# Restart all services
docker-compose restart

Health Check Status

# Check container health
docker-compose ps

# Inspect health check details
docker inspect llm-gateway-redis | jq '.[0].State.Health'

Common Issues

Gateway can’t connect to Ollama

Verify Ollama is running: curl http://localhost:11434/api/tags
Check extra_hosts configuration includes host.docker.internal
On Linux, use --network=host or configure proper bridge networking

Redis connection refused

Ensure Redis container is healthy: docker-compose ps redis
Verify REDIS_URL uses correct hostname (redis not localhost)
Check Redis logs: docker-compose logs redis

Port conflicts

Check if ports are already in use: netstat -tuln | grep LISTEN
Modify port mappings in docker-compose.yml if needed

Maintenance

Updating the Stack

# Pull latest changes
git pull

# Rebuild and restart
docker-compose up -d --build

# Remove old images
docker image prune -f

Clearing Cache

# Clear Redis cache
docker-compose exec redis redis-cli FLUSHALL

# Or restart Redis
docker-compose restart redis

Complete Cleanup

# Stop and remove all containers
docker-compose down

# Remove volumes (WARNING: deletes all data)
docker-compose down -v

# Remove images
docker-compose down --rmi all

Get Started

Core Concepts

Providers

Observability

Deployment

Overview

Prerequisites

Quick Start

Service Architecture

Gateway Service

Redis Service

Prometheus Service

Grafana Service

Frontend Service

Docker Image

Image Features

Volume Management

Backing Up Redis Data

Service Access

Production Considerations

Security Hardening

Resource Limits

TLS/SSL Configuration

Scaling Considerations

Troubleshooting

View Service Logs

Restart Services

Health Check Status

Common Issues

Maintenance

Updating the Stack

Clearing Cache

Complete Cleanup

Build docs developers (and LLMs) love

Get Started

Core Concepts

Providers

Observability

Deployment

​Overview

​Prerequisites

​Quick Start

​Service Architecture

​Gateway Service

​Redis Service

​Prometheus Service

​Grafana Service

​Frontend Service

​Docker Image

​Image Features

​Volume Management

​Backing Up Redis Data

​Service Access

​Production Considerations

​Security Hardening

​Resource Limits

​TLS/SSL Configuration

​Scaling Considerations

​Troubleshooting

​View Service Logs

​Restart Services

​Health Check Status

​Common Issues

​Maintenance

​Updating the Stack

​Clearing Cache

​Complete Cleanup

Build docs developers (and LLMs) love

Overview

Prerequisites

Quick Start

Service Architecture

Gateway Service

Redis Service

Prometheus Service

Grafana Service

Frontend Service

Docker Image

Image Features

Volume Management

Backing Up Redis Data

Service Access

Production Considerations

Security Hardening

Resource Limits

TLS/SSL Configuration

Scaling Considerations

Troubleshooting

View Service Logs

Restart Services

Health Check Status

Common Issues

Maintenance

Updating the Stack

Clearing Cache

Complete Cleanup