Skip to main content
ClinicalPilot is designed for production deployment in healthcare environments with strict security, compliance, and performance requirements.

Deployment Checklist

1

Environment Configuration

  • Set all required API keys in environment variables (not in code)
  • Use .env for local dev, cloud secret managers for production
  • Enable HTTPS (required for HIPAA)
  • Configure CORS to allow only trusted origins
  • Set LOG_LEVEL=WARNING or ERROR (not DEBUG)
2

Database Setup

  • Initialize LanceDB with production data path
  • Ingest clinical guidelines, protocols, drug references
  • Set up automated backups (daily recommended)
  • Configure vector store on persistent storage (not ephemeral)
3

Security Hardening

  • Enable PHI anonymization (Microsoft Presidio)
  • Review and audit anonymized outputs
  • Disable debug endpoints in production
  • Implement API authentication (JWT, OAuth2)
  • Set up rate limiting (protect against abuse)
  • Enable audit logging (all API calls, user actions)
4

Observability

  • Configure LangSmith or self-hosted Langfuse
  • Set up application monitoring (Sentry, Datadog, etc.)
  • Configure health check endpoints for load balancers
  • Set up alerting for failures, slow responses, high error rates
5

Performance Optimization

  • Use multiple workers: --workers 4
  • Configure async worker pool size
  • Enable HTTP/2 for faster API responses
  • Set up caching for RAG queries (Redis recommended)
  • Use local LLM (MedGemma) for cost reduction (optional)
6

Compliance

  • Review HIPAA requirements (see HIPAA Compliance)
  • Sign BAA with cloud provider
  • Enable encryption at rest (database, logs)
  • Enable encryption in transit (TLS 1.2+)
  • Configure data retention policies
  • Set up access controls (RBAC)

Production Architecture

                    ┌─────────────────┐
                    │  Load Balancer  │
                    │   (nginx/ALB)   │
                    └────────┬────────┘

              ┌──────────────┼──────────────┐
              │              │              │
        ┌─────▼─────┐  ┌─────▼─────┐  ┌─────▼─────┐
        │  FastAPI  │  │  FastAPI  │  │  FastAPI  │
        │  Worker 1 │  │  Worker 2 │  │  Worker 3 │
        └─────┬─────┘  └─────┬─────┘  └─────┬─────┘
              │              │              │
              └──────────────┼──────────────┘

              ┌──────────────┼──────────────┐
              │              │              │
        ┌─────▼─────┐  ┌─────▼─────┐  ┌─────▼─────┐
        │  LanceDB  │  │   Redis   │  │  OpenAI   │
        │  (Vector) │  │  (Cache)  │  │    API    │
        └───────────┘  └───────────┘  └───────────┘

Deployment Options

Option 1: Cloud VM (AWS EC2, Azure VM, GCP Compute)

1

Provision VM

Recommended specs:
  • CPU: 4+ cores
  • RAM: 16GB+ (32GB if using local LLM)
  • Storage: 100GB+ SSD
  • OS: Ubuntu 22.04 LTS
2

Install Dependencies

# Update system
sudo apt update && sudo apt upgrade -y

# Install Python 3.11
sudo apt install -y python3.11 python3.11-venv python3-pip

# Install system packages for PDF parsing
sudo apt install -y poppler-utils tesseract-ocr libmagic1

# Install nginx for reverse proxy
sudo apt install -y nginx certbot python3-certbot-nginx
3

Clone and Setup

cd /opt
git clone <your-repo-url> clinicalpilot
cd clinicalpilot

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
4

Configure Environment

# Use cloud secret manager (AWS Secrets Manager, Azure Key Vault)
# or encrypted .env file with restrictive permissions
cp .env.example .env
chmod 600 .env
nano .env  # Add production keys
5

Run with Systemd

Create /etc/systemd/system/clinicalpilot.service:
[Unit]
Description=ClinicalPilot FastAPI Application
After=network.target

[Service]
Type=notify
User=clinicalpilot
Group=clinicalpilot
WorkingDirectory=/opt/clinicalpilot
EnvironmentFile=/opt/clinicalpilot/.env
ExecStart=/opt/clinicalpilot/venv/bin/uvicorn backend.main:app \
  --host 127.0.0.1 \
  --port 8000 \
  --workers 4 \
  --log-level warning
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable clinicalpilot
sudo systemctl start clinicalpilot
sudo systemctl status clinicalpilot
6

Configure Nginx Reverse Proxy

Create /etc/nginx/sites-available/clinicalpilot:
server {
    listen 80;
    server_name clinicalpilot.example.com;

    # Redirect to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name clinicalpilot.example.com;

    # SSL certificates (managed by certbot)
    ssl_certificate /etc/letsencrypt/live/clinicalpilot.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/clinicalpilot.example.com/privkey.pem;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Frame-Options "DENY" always;
    add_header X-Content-Type-Options "nosniff" always;

    # Proxy to FastAPI
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # WebSocket support
    location /ws/ {
        proxy_pass http://127.0.0.1:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    # Static files (frontend)
    location /static/ {
        alias /opt/clinicalpilot/frontend/;
        expires 1d;
    }
}
Enable and restart:
sudo ln -s /etc/nginx/sites-available/clinicalpilot /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
7

Enable HTTPS with Let's Encrypt

sudo certbot --nginx -d clinicalpilot.example.com
Auto-renewal is configured automatically.

Option 2: Docker + Docker Compose

See Docker Deployment for containerized setup.

Option 3: Kubernetes

For high-availability, multi-region deployments:
  • Use Helm charts for deployment
  • Configure horizontal pod autoscaling (HPA)
  • Use persistent volumes for LanceDB
  • Set up ingress with TLS termination
  • Configure readiness/liveness probes
(Full Kubernetes guide coming soon)

Environment Variables

Never commit .env to Git. Use cloud secret managers in production:
  • AWS: Secrets Manager, Parameter Store
  • Azure: Key Vault
  • GCP: Secret Manager
  • Kubernetes: Sealed Secrets, External Secrets Operator
Production .env example:
# LLM
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
USE_LOCAL_LLM=false

# Groq (AI Chat)
GROQ_API_KEY=gsk_...

# PubMed
NCBI_EMAIL=[email protected]
NCBI_API_KEY=...

# Observability (self-hosted)
LANGCHAIN_TRACING_V2=true
LANGSMITH_API_KEY=lsv2_...  # or use Langfuse

# Application
LOG_LEVEL=WARNING
CORS_ORIGINS=["https://clinicalpilot.example.com"]
EMERGENCY_TIMEOUT_SEC=5
MAX_DEBATE_ROUNDS=3

# Data (use persistent storage)
LANCEDB_PATH=/var/lib/clinicalpilot/lancedb
DRUGBANK_CSV_PATH=/var/lib/clinicalpilot/drugbank/drugbank_vocabulary.csv

Performance Tuning

Uvicorn Workers

# Rule of thumb: (2 × CPU cores) + 1
uvicorn backend.main:app --workers 4
Each worker is a separate process with its own memory space. More workers = more concurrent requests, but also more RAM usage.

Async Pool Size

In backend/config.py:
# Increase for high concurrency
import asyncio
asyncio.set_event_loop_policy(asyncio.DefaultEventLoopPolicy())

Caching RAG Queries

Use Redis to cache LanceDB search results:
import redis

cache = redis.Redis(host='localhost', port=6379, decode_responses=True)

def search_with_cache(query: str, top_k: int = 5):
    cache_key = f"rag:{query}:{top_k}"
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    results = search(query, top_k)
    cache.setex(cache_key, 3600, json.dumps(results))  # 1 hour TTL
    return results

Health Checks

Configure load balancer health checks:
GET /api/health
Expected response (200 OK):
{
  "status": "ok",
  "version": "1.0.0",
  "timestamp": "2026-03-03T14:30:00Z"
}
For Kubernetes:
livenessProbe:
  httpGet:
    path: /api/health
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /api/health
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5

Monitoring

Application Metrics

Integrate with Prometheus:
pip install prometheus-fastapi-instrumentator
# backend/main.py
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()
Instrumentator().instrument(app).expose(app)
Metrics endpoint: GET /metrics

Error Tracking

Integrate Sentry:
pip install sentry-sdk[fastapi]
# backend/main.py
import sentry_sdk

sentry_sdk.init(
    dsn="https://[email protected]/...",
    environment="production",
    traces_sample_rate=0.1,
)

Backup and Disaster Recovery

LanceDB Backups

# Daily backup script
#!/bin/bash
BACKUP_DIR="/backups/lancedb"
DATE=$(date +%Y%m%d)

tar -czf "$BACKUP_DIR/lancedb_$DATE.tar.gz" /var/lib/clinicalpilot/lancedb/

# Rotate old backups (keep 30 days)
find "$BACKUP_DIR" -name "lancedb_*.tar.gz" -mtime +30 -delete
Add to crontab:
0 2 * * * /opt/clinicalpilot/scripts/backup_lancedb.sh

Configuration Backups

Backup environment variables, secrets, and configuration:
# Encrypted backup
tar -czf config_backup.tar.gz .env backend/config.py
openssl enc -aes-256-cbc -salt -in config_backup.tar.gz -out config_backup.tar.gz.enc
rm config_backup.tar.gz

Scaling

Horizontal Scaling

Run multiple instances behind a load balancer:
  • Stateless: FastAPI workers are stateless (no session storage)
  • Shared LanceDB: Use network-attached storage (EFS, NFS, Ceph)
  • Shared cache: Use Redis cluster for distributed caching

Vertical Scaling

Increase instance resources:
  • RAM: For larger embedding models, more workers
  • CPU: For faster LLM inference (local models)
  • Storage: For larger vector stores

Troubleshooting

High Memory Usage

  • Reduce --workers count
  • Use smaller embedding model
  • Enable swap space (not recommended for production)

Slow API Responses

  • Check OpenAI API status
  • Enable LangSmith tracing to identify bottlenecks
  • Add caching for RAG queries
  • Use local LLM for non-critical agents

Database Corruption

If LanceDB becomes corrupted:
# Restore from backup
tar -xzf /backups/lancedb/lancedb_20260303.tar.gz -C /

# Re-initialize if backup unavailable
python -m backend.rag.lancedb_store --init
python -m backend.rag.lancedb_store --ingest /path/to/source/documents

Next Steps

Docker Deployment

Deploy ClinicalPilot with Docker Compose

HIPAA Compliance

Security and compliance requirements

Build docs developers (and LLMs) love