Skip to main content

Overview

LLM Gateway Core uses Docker Compose to orchestrate a complete deployment stack including the gateway API, Redis cache, monitoring tools, and frontend interface. This guide covers the full deployment process from building images to running the production stack.

Prerequisites

Before deploying, ensure you have:
  • Docker Engine 20.10 or later
  • Docker Compose v2.0 or later
  • At least 2GB of available RAM
  • Port availability: 8000, 8501, 6380, 9090, 3000
For local model support via Ollama, you need Ollama running on your host machine at http://localhost:11434

Quick Start

1

Clone and Configure

Create a .env file in the project root with your configuration:
cp .env.example .env
# Edit .env with your settings
See Environment Variables for all available options.
2

Build and Start Services

Deploy the entire stack with a single command:
docker-compose up -d --build
This will start all services in detached mode and build the gateway image.
3

Verify Deployment

Check that all services are running:
docker-compose ps
Access the health endpoint:
curl http://localhost:8000/api/v1/health

Service Architecture

The Docker Compose stack includes five interconnected services:

Gateway Service

The core FastAPI application that handles LLM requests and routing.
gateway:
  build: .
  ports:
    - "8000:8000"
  environment:
    - PORT=8000
    - REDIS_URL=redis://redis:6379/0
    - OLLAMA_BASE_URL=http://host.docker.internal:11434
  env_file:
    - .env
  depends_on:
    - redis
  extra_hosts:
    - "host.docker.internal:host-gateway"
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
    interval: 10s
    timeout: 5s
    retries: 3
ports
array
Maps container port 8000 to host port 8000 for API access
extra_hosts
array
Enables the container to access services on the host machine (required for Ollama)
healthcheck
object
Monitors gateway availability every 10 seconds via the health endpoint

Redis Service

Provides distributed caching and rate limiting functionality.
redis:
  image: redis:alpine
  container_name: llm-gateway-redis
  ports:
    - "6380:6379"
  volumes:
    - redis_data:/data
  healthcheck:
    test: ["CMD", "redis-cli", "ping"]
    interval: 5s
    timeout: 3s
    retries: 5
ports
array
Exposes Redis on host port 6380 (avoids conflicts with local Redis on 6379)
volumes
array
Persists Redis data across container restarts

Prometheus Service

Collects and stores metrics from the gateway.
prometheus:
  image: prom/prometheus:latest
  ports:
    - "9090:9090"
  volumes:
    - ./prometheus.yml:/etc/prometheus/prometheus.yml
  command:
    - '--config.file=/etc/prometheus/prometheus.yml'
    - '--storage.tsdb.path=/prometheus'
ports
array
Web UI and API available at http://localhost:9090
volumes
array
Mounts Prometheus configuration from the project directory

Grafana Service

Provides visualization dashboards for monitoring.
grafana:
  image: grafana/grafana:latest
  ports:
    - "3000:3000"
  volumes:
    - ./grafana/provisioning:/etc/grafana/provisioning
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
The default Grafana configuration allows anonymous admin access. Change GF_SECURITY_ADMIN_PASSWORD and disable anonymous access in production.

Frontend Service

Streamlit-based web interface for testing the gateway.
frontend:
  build: .
  ports:
    - "8501:8501"
  command: ["streamlit", "run", "frontend/main.py", "--server.port=8501", "--server.address=0.0.0.0"]
  depends_on:
    - gateway
  environment:
    - PYTHONPATH=/app

Docker Image

The gateway uses a multi-stage Dockerfile optimized for production:
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim

WORKDIR /app

ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy

RUN --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev

ENV PATH="/app/.venv/bin:$PATH"

WORKDIR /app
COPY . .

ENV PYTHONPATH=/app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Image Features

  • Base Image: Uses uv-enabled Python 3.13 slim image for fast dependency installation
  • Build Cache: Leverages Docker cache mounts for faster rebuilds
  • Bytecode Compilation: Precompiles Python for improved startup performance
  • Production Server: Runs with Uvicorn ASGI server for high concurrency

Volume Management

The stack creates a named volume for Redis persistence:
volumes:
  redis_data:
    name: llm-gateway-redis-data
This ensures cached responses and rate limit counters survive container restarts.

Backing Up Redis Data

# Create backup
docker run --rm -v llm-gateway-redis-data:/data -v $(pwd):/backup \
  alpine tar czf /backup/redis-backup.tar.gz -C /data .

# Restore backup
docker run --rm -v llm-gateway-redis-data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/redis-backup.tar.gz -C /data

Service Access

After deployment, access the services at:
curl http://localhost:8000/api/v1/health

Production Considerations

Security Hardening

The default configuration is optimized for development. Apply these changes for production:
  1. Change Default Credentials: Update Grafana admin password
  2. Secure Redis: Add Redis password authentication
  3. API Key Management: Use strong, randomly generated API keys
  4. Disable Anonymous Access: Remove anonymous Grafana access
  5. Network Isolation: Use internal Docker networks for service communication

Resource Limits

Add resource constraints to prevent resource exhaustion:
gateway:
  deploy:
    resources:
      limits:
        cpus: '1'
        memory: 1G
      reservations:
        cpus: '0.5'
        memory: 512M

TLS/SSL Configuration

For production deployments, add a reverse proxy (nginx/Caddy) with TLS:
nginx:
  image: nginx:alpine
  ports:
    - "443:443"
    - "80:80"
  volumes:
    - ./nginx.conf:/etc/nginx/nginx.conf
    - ./certs:/etc/nginx/certs
  depends_on:
    - gateway

Scaling Considerations

To scale the gateway horizontally:
# Scale gateway to 3 instances
docker-compose up -d --scale gateway=3

# Add load balancer for distribution
Ensure Redis is properly configured for distributed rate limiting across instances.

Troubleshooting

View Service Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f gateway

# Last 100 lines
docker-compose logs --tail=100 gateway

Restart Services

# Restart specific service
docker-compose restart gateway

# Restart all services
docker-compose restart

Health Check Status

# Check container health
docker-compose ps

# Inspect health check details
docker inspect llm-gateway-redis | jq '.[0].State.Health'

Common Issues

Gateway can’t connect to Ollama
  • Verify Ollama is running: curl http://localhost:11434/api/tags
  • Check extra_hosts configuration includes host.docker.internal
  • On Linux, use --network=host or configure proper bridge networking
Redis connection refused
  • Ensure Redis container is healthy: docker-compose ps redis
  • Verify REDIS_URL uses correct hostname (redis not localhost)
  • Check Redis logs: docker-compose logs redis
Port conflicts
  • Check if ports are already in use: netstat -tuln | grep LISTEN
  • Modify port mappings in docker-compose.yml if needed

Maintenance

Updating the Stack

# Pull latest changes
git pull

# Rebuild and restart
docker-compose up -d --build

# Remove old images
docker image prune -f

Clearing Cache

# Clear Redis cache
docker-compose exec redis redis-cli FLUSHALL

# Or restart Redis
docker-compose restart redis

Complete Cleanup

# Stop and remove all containers
docker-compose down

# Remove volumes (WARNING: deletes all data)
docker-compose down -v

# Remove images
docker-compose down --rmi all

Build docs developers (and LLMs) love