Environment Variables

Overview

LLM Gateway Core is configured entirely through environment variables, allowing for flexible deployment without code modifications. This page provides a comprehensive reference for all available variables.

Configuration File

Environment variables are typically defined in a .env file in the project root:

.env

PROVIDER_TIMEOUT_SECONDS=60
PROVIDER_MAX_RETRIES=3
CACHE_TTL_SECONDS=60
RATE_LIMITER_CAPACITY=5
RATE_LIMITER_REFILL_RATE=1
REDIS_URL=redis://127.0.0.1:6380/0
GEMINI_API_KEY=your_api_key_here
OLLAMA_BASE_URL=http://localhost:11434
API_KEYS=sk-gateway-123

The gateway uses Pydantic Settings for configuration management, supporting both .env files and system environment variables.

Complete Variable Reference

PROVIDER_TIMEOUT_SECONDS

integer

default:"60"

Maximum time in seconds to wait for an LLM provider response before timing out.

Type: int
Default: 60
Range: 1 - 300
Required: No Description: Controls how long the gateway will wait for a response from LLM providers (Gemini, Ollama) before abandoning the request and returning a timeout error. Use Cases:

Increase (90-120s): Complex queries, large context windows, slower models
Decrease (30-45s): Fast-fail scenarios, real-time applications, high-traffic environments
Production Recommendation: 60 seconds for balanced behavior

Example:

# Short timeout for quick responses
PROVIDER_TIMEOUT_SECONDS=30

# Extended timeout for complex queries
PROVIDER_TIMEOUT_SECONDS=120

Impact:

Too low: Premature timeouts on legitimate slow queries
Too high: Long wait times on failed/stuck requests
Affects user experience and resource utilization

PROVIDER_MAX_RETRIES

integer

default:"3"

Number of retry attempts for failed provider requests.

Type: int
Default: 3
Range: 0 - 10
Required: No Description: Defines how many times the gateway will retry a failed request to an LLM provider before giving up. Uses exponential backoff between retries. Retry Logic:

Only retries on transient errors (network issues, 5xx server errors, timeouts)
Does NOT retry on client errors (4xx responses, invalid requests)
Applies exponential backoff: 1s, 2s, 4s, 8s, etc.

Use Cases:

0 retries: Development/testing, when immediate feedback is needed
2-3 retries: Production default, balances reliability and latency
5+ retries: Critical applications where success is paramount

Example:

# No retries (fail fast)
PROVIDER_MAX_RETRIES=0

# Standard production setting
PROVIDER_MAX_RETRIES=3

# High reliability requirement
PROVIDER_MAX_RETRIES=5

Impact:

More retries: Better reliability, higher latency on failures
Fewer retries: Lower latency, may fail on transient issues

CACHE_TTL_SECONDS

integer

default:"60"

Time-to-live for cached LLM responses in Redis.

Type: int
Default: 60
Range: 0 - 86400 (0 = disabled, max = 24 hours)
Required: No Description: Determines how long cached responses remain valid in Redis before expiring. Caching reduces API costs and improves response times for repeated queries. Caching Behavior:

Cache key includes: prompt, model, temperature, and other parameters
Exact match required for cache hit
Expired entries are automatically removed by Redis
Set to 0 to disable caching entirely

Use Cases by TTL:

0: Disable caching (always fresh responses)
60-300s: Interactive applications (1-5 minutes)
3600s: FAQ/documentation bots (1 hour)
86400s: Static content, knowledge bases (24 hours)

Example:

# Disable caching
CACHE_TTL_SECONDS=0

# Short cache for dynamic content
CACHE_TTL_SECONDS=60

# Extended cache for stable content
CACHE_TTL_SECONDS=3600

Considerations:

Longer TTL: Lower costs, potential stale responses
Shorter TTL: Fresher responses, higher API usage
Balance based on content volatility and cost sensitivity

RATE_LIMITER_CAPACITY

integer

default:"5"

Maximum burst capacity for the token bucket rate limiter.

Type: int
Default: 5
Range: 1 - 1000
Required: No Description: Sets the maximum number of tokens in the rate limiter’s token bucket. This determines how many requests a client can make in a burst before being rate limited. Token Bucket Algorithm:

Each client has a separate bucket identified by API key
Each request consumes 1 token
Tokens refill at the rate defined by RATE_LIMITER_REFILL_RATE
Requests are rejected with HTTP 429 when bucket is empty

Use Cases:

Low capacity (5-10): Strict rate limiting, prevent abuse
Medium capacity (50-100): Normal API usage
High capacity (500+): Premium clients, internal services

Example:

# Strict limiting for public API
RATE_LIMITER_CAPACITY=5

# Generous limits for paid tiers
RATE_LIMITER_CAPACITY=100

# Unrestricted for internal use
RATE_LIMITER_CAPACITY=1000

Rate Limiting Scenarios:

# Allow 10 burst requests, then 1 per second
RATE_LIMITER_CAPACITY=10
RATE_LIMITER_REFILL_RATE=1

# Allow 100 burst requests, then 10 per second
RATE_LIMITER_CAPACITY=100
RATE_LIMITER_REFILL_RATE=10

RATE_LIMITER_REFILL_RATE

integer

default:"1"

Number of tokens added to the bucket per second.

Type: int
Default: 1
Range: 1 - 1000
Required: No Description: Defines the sustained request rate by controlling how many tokens are added to each client’s bucket per second. Rate Calculations:

REFILL_RATE=1: 1 request/second = 60 requests/minute = 3,600 requests/hour
REFILL_RATE=10: 10 requests/second = 600 requests/minute = 36,000 requests/hour
REFILL_RATE=100: 100 requests/second = 6,000 requests/minute

Example:

# Conservative rate for free tier
RATE_LIMITER_REFILL_RATE=1

# Moderate rate for standard tier
RATE_LIMITER_REFILL_RATE=10

# High rate for enterprise tier
RATE_LIMITER_REFILL_RATE=100

Combined Examples:

# Free tier: 5 burst, 1/sec sustained (60/min)
RATE_LIMITER_CAPACITY=5
RATE_LIMITER_REFILL_RATE=1

# Pro tier: 50 burst, 10/sec sustained (600/min)
RATE_LIMITER_CAPACITY=50
RATE_LIMITER_REFILL_RATE=10

# Enterprise: 100 burst, 50/sec sustained (3000/min)
RATE_LIMITER_CAPACITY=100
RATE_LIMITER_REFILL_RATE=50

REDIS_URL

string

default:"redis://127.0.0.1:6380/0"

required

Connection URL for the Redis instance used for caching and rate limiting.

Type: str
Default: redis://127.0.0.1:6380/0
Format: redis://[username:password@]host:port/database
Required: Yes (gateway will not function without Redis) Description: Specifies the Redis connection string for the distributed cache and rate limiter storage. URL Format Components:

Protocol: redis:// (or rediss:// for TLS)
Authentication: username:password@ (optional)
Host: hostname or IP address
Port: Redis port (default 6379)
Database: Redis database number (0-15)

Examples by Environment:

# Local development (Redis on host)
REDIS_URL=redis://127.0.0.1:6380/0

# Docker Compose (Redis container)
REDIS_URL=redis://redis:6379/0

# With password authentication
REDIS_URL=redis://:my_secure_password@redis:6379/0

# With username and password
REDIS_URL=redis://admin:secret_password@redis:6379/0

# Remote Redis server
REDIS_URL=redis://redis-prod.example.com:6379/1

# Redis with TLS
REDIS_URL=rediss://redis-prod.example.com:6380/0

# Redis Sentinel (high availability)
REDIS_URL=redis://sentinel1:26379,sentinel2:26379/mymaster/0

# Redis Cluster
REDIS_URL=redis://node1:6379,node2:6379,node3:6379/0

Docker-Specific Configuration:

# docker-compose.yml
services:
  gateway:
    environment:
      - REDIS_URL=redis://redis:6379/0  # Use service name as hostname
  redis:
    ports:
      - "6380:6379"  # Expose on host port 6380

Testing Connection:

# Test Redis connectivity
redis-cli -u redis://127.0.0.1:6380/0 ping
# Expected output: PONG

# With authentication
redis-cli -u redis://:password@host:6379/0 ping

Always use authentication in production. Expose Redis only to trusted networks.

GEMINI_API_KEY

string

default:""

API key for Google Gemini LLM integration.

Type: str
Default: "" (empty, Gemini provider disabled)
Format: Alphanumeric string from Google AI Studio
Required: Only if using Gemini (online/fast modes) Description: Authentication key for accessing Google’s Gemini API. Required for the gateway to route requests to Gemini models. Obtaining an API Key:

Visit Google AI Studio
Sign in with your Google account
Click “Create API Key”
Copy the generated key
Set it as an environment variable

Example:

# Production key
GEMINI_API_KEY=AIzaSyBxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Development key
GEMINI_API_KEY=AIzaSyDev_xxxxxxxxxxxxxxxxxxxxxxxx

Security Best Practices:

Never commit API keys to version control
Use separate keys for development and production
Rotate keys periodically
Monitor usage in Google Cloud Console
Set up billing alerts to prevent unexpected charges

.gitignore

# Always exclude environment files
.env
.env.local
.env.*.local

Usage Validation:

# Test API key validity
curl "https://generativelanguage.googleapis.com/v1/models?key=$GEMINI_API_KEY"

Behavior When Unset:

Gateway starts successfully
Requests to Gemini (online/fast modes) return errors
Ollama (local/secure modes) continue to work

OLLAMA_BASE_URL

string

default:"http://localhost:11434"

Base URL for the Ollama API endpoint.

Type: str
Default: http://localhost:11434
Format: Full HTTP/HTTPS URL
Required: Only if using Ollama (local/secure modes) Description: Specifies the endpoint for Ollama API requests. Ollama provides local LLM inference for private, secure, or offline deployments. Configuration by Deployment:

# Local development (Ollama on host machine)
OLLAMA_BASE_URL=http://localhost:11434

# Docker Compose (accessing host Ollama from container)
OLLAMA_BASE_URL=http://host.docker.internal:11434

# Remote Ollama server
OLLAMA_BASE_URL=http://ollama-server.internal:11434

# Ollama with custom port
OLLAMA_BASE_URL=http://localhost:8080

# HTTPS (with reverse proxy)
OLLAMA_BASE_URL=https://ollama.example.com

Docker Compose Setup:

# docker-compose.yml
services:
  gateway:
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"  # Required for Docker access to host

Linux Docker Networking: On Linux, host.docker.internal doesn’t work by default. Use one of these approaches:

# Option 1: Use host network mode
services:
  gateway:
    network_mode: host
    environment:
      - OLLAMA_BASE_URL=http://localhost:11434

# Option 2: Use host IP address
services:
  gateway:
    environment:
      - OLLAMA_BASE_URL=http://192.168.1.100:11434

# Option 3: Run Ollama in Docker too
services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
  gateway:
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434

Verifying Ollama Connectivity:

# Test Ollama is accessible
curl http://localhost:11434/api/tags

# From Docker container
docker-compose exec gateway curl http://host.docker.internal:11434/api/tags

# Expected response: JSON with available models

Installing Ollama:

# macOS/Linux
curl https://ollama.ai/install.sh | sh

# Start Ollama
ollama serve

# Pull a model
ollama pull llama2

API_KEYS

string

default:"sk-gateway-123"

required

Comma-separated list of valid API keys for gateway authentication.

Type: str
Default: sk-gateway-123
Format: Comma-separated string of keys
Required: Yes (gateway enforces authentication) Description: Defines the valid API keys that clients must provide to access the gateway. Multiple keys allow for different clients or access tiers. Format:

# Single key
API_KEYS=sk-gateway-production-xyz789

# Multiple keys (comma-separated, no spaces)
API_KEYS=sk-gateway-client1,sk-gateway-client2,sk-gateway-admin

# With descriptive prefixes
API_KEYS=sk-prod-web-app,sk-prod-mobile-app,sk-dev-testing

Client Usage: Clients include the API key in the Authorization header:

# Using Bearer token
curl -X POST http://localhost:8000/api/v1/chat/completions \
  -H "Authorization: Bearer sk-gateway-123" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}]}'

Generating Secure Keys:

Always use cryptographically secure random keys in production.

# Using OpenSSL
openssl rand -hex 32
# Output: sk-gateway-a1b2c3d4e5f6...

# Using Python
python3 -c "import secrets; print('sk-gateway-' + secrets.token_urlsafe(32))"
# Output: sk-gateway-Xy9Zk3Qm7Rp...

# Generate multiple keys
for i in {1..5}; do openssl rand -hex 16 | sed 's/^/sk-prod-/'; done

Key Management Strategies:

# Environment-specific keys
# Development
API_KEYS=sk-dev-local-testing

# Staging
API_KEYS=sk-staging-app1,sk-staging-app2

# Production
API_KEYS=sk-prod-web-Xy9Zk3Qm,sk-prod-mobile-Rp7Tm2Vn,sk-prod-internal-Qs8Un4Wo

Multi-Tier Access:

# Different keys for different service tiers
API_KEYS=sk-free-user1,sk-free-user2,sk-pro-user3,sk-enterprise-user4

# Combine with different rate limiting per key (future enhancement)

Security Recommendations:

Never use default keys in production (sk-gateway-123 is for development only)
Rotate keys regularly (quarterly or after suspected compromise)
Use separate keys per client (enables individual key revocation)
Monitor key usage (detect unauthorized access or compromised keys)
Store keys securely (use secrets managers, not plain text files)

Secrets Management:

# AWS Secrets Manager
aws secretsmanager get-secret-value --secret-id llm-gateway/api-keys \
  --query SecretString --output text

# HashiCorp Vault
vault kv get -field=api_keys secret/llm-gateway

# Kubernetes Secret
kubectl create secret generic gateway-api-keys --from-literal=keys="sk-prod-..."

Environment-Specific Examples

Local Development

.env.development

# Provider Configuration
PROVIDER_TIMEOUT_SECONDS=30
PROVIDER_MAX_RETRIES=2

# Cache Configuration  
CACHE_TTL_SECONDS=300
REDIS_URL=redis://127.0.0.1:6380/0

# Rate Limiting (generous for testing)
RATE_LIMITER_CAPACITY=100
RATE_LIMITER_REFILL_RATE=10

# Provider Integration
GEMINI_API_KEY=AIzaSy_dev_key_here
OLLAMA_BASE_URL=http://localhost:11434

# Authentication
API_KEYS=sk-dev-test-123

Docker Compose

.env.docker

# Provider Configuration
PROVIDER_TIMEOUT_SECONDS=60
PROVIDER_MAX_RETRIES=3

# Cache Configuration
CACHE_TTL_SECONDS=60
REDIS_URL=redis://redis:6379/0

# Rate Limiting
RATE_LIMITER_CAPACITY=10
RATE_LIMITER_REFILL_RATE=2

# Provider Integration
GEMINI_API_KEY=AIzaSy_your_actual_key
OLLAMA_BASE_URL=http://host.docker.internal:11434

# Authentication
API_KEYS=sk-gateway-docker-xyz789

Production

.env.production

# Provider Configuration (optimized for reliability)
PROVIDER_TIMEOUT_SECONDS=45
PROVIDER_MAX_RETRIES=3

# Cache Configuration (longer TTL for cost savings)
CACHE_TTL_SECONDS=300
REDIS_URL=redis://:[email protected]:6379/0

# Rate Limiting (strict to prevent abuse)
RATE_LIMITER_CAPACITY=5
RATE_LIMITER_REFILL_RATE=1

# Provider Integration (from secrets manager)
GEMINI_API_KEY=${GEMINI_KEY_FROM_VAULT}
OLLAMA_BASE_URL=http://ollama-prod.internal:11434

# Authentication (secure, rotated keys)
API_KEYS=sk-prod-web-Xy9Zk3Qm,sk-prod-mobile-Rp7Tm2Vn,sk-prod-internal-Qs8Un4Wo

Validation and Troubleshooting

Checking Current Configuration

# View loaded configuration (excluding secrets)
from app.core.config import settings

print(f"Timeout: {settings.PROVIDER_TIMEOUT_SECONDS}s")
print(f"Max Retries: {settings.PROVIDER_MAX_RETRIES}")
print(f"Cache TTL: {settings.CACHE_TTL_SECONDS}s")
print(f"Rate Limit Capacity: {settings.RATE_LIMITER_CAPACITY}")
print(f"Rate Limit Refill: {settings.RATE_LIMITER_REFILL_RATE}/s")
print(f"Redis: {settings.REDIS_URL.split('@')[-1]}")  # Hide password
print(f"Ollama: {settings.OLLAMA_BASE_URL}")
print(f"Gemini Configured: {bool(settings.GEMINI_API_KEY)}")
print(f"API Keys Count: {len(settings.API_KEYS.split(','))}")

Common Configuration Errors

Redis Connection Failed

# Error: Connection refused
# Solution: Check Redis is running
redis-cli ping

# Solution: Verify URL hostname (use service name in Docker)
REDIS_URL=redis://redis:6379/0  # Not localhost in Docker!

Ollama Not Accessible

# Error: Connection refused on Ollama requests
# Solution: Test Ollama endpoint
curl http://localhost:11434/api/tags

# Solution: Use host.docker.internal in Docker
OLLAMA_BASE_URL=http://host.docker.internal:11434

Invalid API Key

# Error: 401 Unauthorized
# Solution: Verify API key is in the list
echo $API_KEYS | grep -o 'sk-gateway-123'

# Solution: Ensure no whitespace in keys
API_KEYS=key1,key2  # ✓ Correct
API_KEYS=key1, key2  # ✗ Space causes issues

Type Validation Error

# Error: validation error for Settings
# Solution: Ensure integers are not quoted
PROVIDER_TIMEOUT_SECONDS=60  # ✓ Correct
PROVIDER_TIMEOUT_SECONDS="60"  # ✗ May cause issues

Next Steps

Learn about the Configuration System architecture
Follow the Docker Deployment guide
Explore the API Reference to use your configured gateway

Get Started

Core Concepts

Providers

Observability

Deployment

Overview

Configuration File

Complete Variable Reference

PROVIDER_TIMEOUT_SECONDS

PROVIDER_MAX_RETRIES

CACHE_TTL_SECONDS

RATE_LIMITER_CAPACITY

RATE_LIMITER_REFILL_RATE

REDIS_URL

GEMINI_API_KEY

OLLAMA_BASE_URL

API_KEYS

Environment-Specific Examples

Local Development

Docker Compose

Production

Validation and Troubleshooting

Checking Current Configuration

Common Configuration Errors

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Providers

Observability

Deployment

​Overview

​Configuration File

​Complete Variable Reference

​PROVIDER_TIMEOUT_SECONDS

​PROVIDER_MAX_RETRIES

​CACHE_TTL_SECONDS

​RATE_LIMITER_CAPACITY

​RATE_LIMITER_REFILL_RATE

​REDIS_URL

​GEMINI_API_KEY

​OLLAMA_BASE_URL

​API_KEYS

​Environment-Specific Examples

​Local Development

​Docker Compose

​Production

​Validation and Troubleshooting

​Checking Current Configuration

​Common Configuration Errors

​Next Steps

Build docs developers (and LLMs) love

Overview

Configuration File

Complete Variable Reference

PROVIDER_TIMEOUT_SECONDS

PROVIDER_MAX_RETRIES

CACHE_TTL_SECONDS

RATE_LIMITER_CAPACITY

RATE_LIMITER_REFILL_RATE

REDIS_URL

GEMINI_API_KEY

OLLAMA_BASE_URL

API_KEYS

Environment-Specific Examples

Local Development

Docker Compose

Production

Validation and Troubleshooting

Checking Current Configuration

Common Configuration Errors

Next Steps