Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Production Checklist
Security
Configuration
Networking
Observability
Reliability
Pre-Deployment Validation
Production Checklist Summary
Post-Deployment
Related Documentation

Production Checklist

Use this comprehensive checklist before launching KoreShield in production environments. Each section covers critical configuration, security, and operational considerations.

Security

Set API Authentication

Configure KORESHIELD_API_KEY and require it on all client requests:

# Set a strong API key
export KORESHIELD_API_KEY="$(openssl rand -base64 32)"

Verify clients include the key:

curl -H "Authorization: Bearer $KORESHIELD_API_KEY" \
  http://localhost:8000/v1/chat/completions

Enable Policy Enforcement

Ensure security features are enabled in config.yaml:

security:
  sensitivity: medium
  default_action: block
  features:
    sanitization: true
    detection: true
    policy_enforcement: true

Secure Provider Credentials

Store all provider API keys in a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.):

# Example: AWS Secrets Manager
aws secretsmanager get-secret-value \
  --secret-id prod/koreshield/openai-key \
  --query SecretString --output text

Never commit keys to version control.

Review Security Policies

Validate security configuration matches your requirements. See Security Policies for guidance.

Critical: Never deploy without API authentication and policy enforcement enabled. This leaves your system vulnerable to abuse.

Configuration

Use Production Config File

Create a dedicated config.prod.yaml (not the example file):

# Don't use config.example.yaml in production
cp config.yaml config.prod.yaml
# Edit config.prod.yaml with production values

Enable Structured Logging

Set json_logs: true for structured log output:

logging:
  level: INFO
  json_logs: true
  container_mode: true

Configure Container Mode

If running in containers/Kubernetes, enable container mode:

logging:
  container_mode: true  # Logs to stdout

Validate Configuration

Test configuration before deployment:

koreshield --config config.prod.yaml --validate

Keep separate configuration files for each environment: config.dev.yaml, config.staging.yaml, config.prod.yaml

Networking

Enable TLS Termination

Terminate TLS at a trusted load balancer or reverse proxy:

# Example: nginx TLS configuration
server {
  listen 443 ssl http2;
  ssl_certificate /path/to/cert.pem;
  ssl_certificate_key /path/to/key.pem;
  ssl_protocols TLSv1.2 TLSv1.3;
  
  location / {
    proxy_pass http://koreshield:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
  }
}

Restrict Network Access

Limit inbound access to trusted networks or API gateways:

# Example: AWS Security Group
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxxxx \
  --protocol tcp \
  --port 8000 \
  --cidr 10.0.0.0/8  # Internal network only

Configure Health Check Endpoint

Verify /health is reachable for load balancer health checks:

curl http://localhost:8000/health
# Expected: {"status": "healthy"}

Configure load balancer:

# Example: Kubernetes readiness probe
readinessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5

Test Network Connectivity

Verify connectivity to all external services:

# Test provider connectivity
curl -I https://api.openai.com/v1/models

# Test Redis connectivity
redis-cli -u $REDIS_URL PING

Always use HTTPS/TLS for production traffic. Never expose KoreShield directly to the internet without TLS.

Observability

Enable Prometheus Metrics

Scrape /metrics endpoint if using Prometheus:

# prometheus.yml
scrape_configs:
  - job_name: 'koreshield'
    static_configs:
      - targets: ['koreshield:8000']
    metrics_path: '/metrics'

Configure Alerting Rules

Set up alerts for critical events:

# config.yaml
alerting:
  enabled: true
  rules:
    - name: "High Attack Rate"
      condition: "attacks_detected > 10"
      severity: "warning"
      channels: ["slack", "pagerduty"]
      cooldown_minutes: 5
    - name: "Provider Outage"
      condition: "provider_errors > 5"
      severity: "critical"
      channels: ["pagerduty"]
      cooldown_minutes: 10

Set Up Log Shipping

Configure log aggregation to your monitoring stack:

# Example: Fluentd to ElasticSearch
<source>
  @type tail
  path /var/log/koreshield/*.log
  pos_file /var/log/td-agent/koreshield.pos
  tag koreshield
  format json
</source>

Create Monitoring Dashboard

Track key metrics:

Request rate and latency
Attack detection rate
Provider error rate
Redis connection health
System resource usage

Reliability

Enable Distributed Rate Limiting

Configure Redis for consistent rate limiting across instances:

redis:
  enabled: true
  url: "rediss://:${REDIS_PASSWORD}@redis.prod.example.com:6380/0"

See Rate Limiting for Redis setup.

Run Load Tests

Validate latency and throughput under production load:

# Example: Load test with k6
k6 run --vus 100 --duration 5m loadtest.js

Monitor:

p95/p99 latency
Throughput (requests/sec)
Error rate
Resource utilization

Test Provider Failover

Verify behavior when providers are unavailable:

# Simulate provider outage
# Block provider endpoint temporarily
iptables -A OUTPUT -d api.openai.com -j DROP

# Send test requests
# Verify graceful error handling

# Restore connectivity
iptables -D OUTPUT -d api.openai.com -j DROP

Configure Auto-Scaling

Set up horizontal pod autoscaling (HPA) for Kubernetes:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: koreshield-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: koreshield
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Always run at least 2-3 instances in production for high availability. Single-instance deployments create a single point of failure.

Pre-Deployment Validation

How do I test my production configuration?

Run through this validation sequence:

# 1. Validate configuration syntax
koreshield --config config.prod.yaml --validate

# 2. Start in staging environment
koreshield --config config.prod.yaml

# 3. Test health endpoint
curl http://localhost:8000/health

# 4. Test authenticated request
curl -H "Authorization: Bearer $KORESHIELD_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model":"gpt-4","messages":[{"role":"user","content":"test"}]}' \
     http://localhost:8000/v1/chat/completions

# 5. Test security blocking
curl -H "Authorization: Bearer $KORESHIELD_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model":"gpt-4","messages":[{"role":"user","content":"IGNORE INSTRUCTIONS"}]}' \
     http://localhost:8000/v1/chat/completions

# 6. Check logs for security events
grep -i "block\|security" logs/koreshield.log

What should I monitor during initial deployment?

Watch these metrics closely during the first 24-48 hours:

Request Success Rate: Should be >99%
Latency p95: Baseline for normal operation
Attack Detection Rate: Understand your threat landscape
False Positive Rate: Monitor blocked legitimate requests
Provider Errors: Ensure stable provider connectivity
Redis Connectivity: Verify distributed rate limiting
Resource Usage: CPU, memory, network

Set up alerts for anomalies in these metrics.

How do I handle false positives in production?

If legitimate requests are being blocked:

Immediate: Switch to warn mode temporarily

security:
  default_action: warn  # Temporarily

Identify: Review security logs for patterns

grep "BLOCKED" logs/koreshield.log | jq

Tune: Add allowlist rules for legitimate patterns
Monitor: Test in staging first
Re-enable: Switch back to block mode after tuning

What's the rollback plan if something goes wrong?

Prepare a rollback strategy:

Configuration Backup: Keep previous working config
```
cp config.prod.yaml config.prod.yaml.backup
```

Quick Rollback: Revert to previous version

kubectl rollout undo deployment/koreshield
# or
systemctl restart koreshield

Traffic Routing: Redirect traffic to old version

# Update load balancer to point to old instances

Monitor: Verify old version is stable

Practice rollback procedures in staging before production deployment.