ClinicalPilot is designed for production deployment in healthcare environments with strict security, compliance, and performance requirements.
Deployment Checklist
Environment Configuration
Production Architecture
┌─────────────────┐
│ Load Balancer │
│ (nginx/ALB) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ FastAPI │ │ FastAPI │ │ FastAPI │
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────┼──────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ LanceDB │ │ Redis │ │ OpenAI │
│ (Vector) │ │ (Cache) │ │ API │
└───────────┘ └───────────┘ └───────────┘
Deployment Options
Option 1: Cloud VM (AWS EC2, Azure VM, GCP Compute)
Provision VM
Recommended specs:
- CPU: 4+ cores
- RAM: 16GB+ (32GB if using local LLM)
- Storage: 100GB+ SSD
- OS: Ubuntu 22.04 LTS
Install Dependencies
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python 3.11
sudo apt install -y python3.11 python3.11-venv python3-pip
# Install system packages for PDF parsing
sudo apt install -y poppler-utils tesseract-ocr libmagic1
# Install nginx for reverse proxy
sudo apt install -y nginx certbot python3-certbot-nginx
Clone and Setup
cd /opt
git clone <your-repo-url> clinicalpilot
cd clinicalpilot
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Configure Environment
# Use cloud secret manager (AWS Secrets Manager, Azure Key Vault)
# or encrypted .env file with restrictive permissions
cp .env.example .env
chmod 600 .env
nano .env # Add production keys
Run with Systemd
Create /etc/systemd/system/clinicalpilot.service:[Unit]
Description=ClinicalPilot FastAPI Application
After=network.target
[Service]
Type=notify
User=clinicalpilot
Group=clinicalpilot
WorkingDirectory=/opt/clinicalpilot
EnvironmentFile=/opt/clinicalpilot/.env
ExecStart=/opt/clinicalpilot/venv/bin/uvicorn backend.main:app \
--host 127.0.0.1 \
--port 8000 \
--workers 4 \
--log-level warning
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable and start:sudo systemctl daemon-reload
sudo systemctl enable clinicalpilot
sudo systemctl start clinicalpilot
sudo systemctl status clinicalpilot
Configure Nginx Reverse Proxy
Create /etc/nginx/sites-available/clinicalpilot:server {
listen 80;
server_name clinicalpilot.example.com;
# Redirect to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name clinicalpilot.example.com;
# SSL certificates (managed by certbot)
ssl_certificate /etc/letsencrypt/live/clinicalpilot.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/clinicalpilot.example.com/privkey.pem;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options "DENY" always;
add_header X-Content-Type-Options "nosniff" always;
# Proxy to FastAPI
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# WebSocket support
location /ws/ {
proxy_pass http://127.0.0.1:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Static files (frontend)
location /static/ {
alias /opt/clinicalpilot/frontend/;
expires 1d;
}
}
Enable and restart:sudo ln -s /etc/nginx/sites-available/clinicalpilot /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
Enable HTTPS with Let's Encrypt
sudo certbot --nginx -d clinicalpilot.example.com
Auto-renewal is configured automatically.
Option 2: Docker + Docker Compose
See Docker Deployment for containerized setup.
Option 3: Kubernetes
For high-availability, multi-region deployments:
- Use Helm charts for deployment
- Configure horizontal pod autoscaling (HPA)
- Use persistent volumes for LanceDB
- Set up ingress with TLS termination
- Configure readiness/liveness probes
(Full Kubernetes guide coming soon)
Environment Variables
Never commit .env to Git. Use cloud secret managers in production:
- AWS: Secrets Manager, Parameter Store
- Azure: Key Vault
- GCP: Secret Manager
- Kubernetes: Sealed Secrets, External Secrets Operator
Production .env example:
# LLM
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
USE_LOCAL_LLM=false
# Groq (AI Chat)
GROQ_API_KEY=gsk_...
# PubMed
NCBI_EMAIL=[email protected]
NCBI_API_KEY=...
# Observability (self-hosted)
LANGCHAIN_TRACING_V2=true
LANGSMITH_API_KEY=lsv2_... # or use Langfuse
# Application
LOG_LEVEL=WARNING
CORS_ORIGINS=["https://clinicalpilot.example.com"]
EMERGENCY_TIMEOUT_SEC=5
MAX_DEBATE_ROUNDS=3
# Data (use persistent storage)
LANCEDB_PATH=/var/lib/clinicalpilot/lancedb
DRUGBANK_CSV_PATH=/var/lib/clinicalpilot/drugbank/drugbank_vocabulary.csv
Uvicorn Workers
# Rule of thumb: (2 × CPU cores) + 1
uvicorn backend.main:app --workers 4
Each worker is a separate process with its own memory space. More workers = more concurrent requests, but also more RAM usage.
Async Pool Size
In backend/config.py:
# Increase for high concurrency
import asyncio
asyncio.set_event_loop_policy(asyncio.DefaultEventLoopPolicy())
Caching RAG Queries
Use Redis to cache LanceDB search results:
import redis
cache = redis.Redis(host='localhost', port=6379, decode_responses=True)
def search_with_cache(query: str, top_k: int = 5):
cache_key = f"rag:{query}:{top_k}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
results = search(query, top_k)
cache.setex(cache_key, 3600, json.dumps(results)) # 1 hour TTL
return results
Health Checks
Configure load balancer health checks:
Expected response (200 OK):
{
"status": "ok",
"version": "1.0.0",
"timestamp": "2026-03-03T14:30:00Z"
}
For Kubernetes:
livenessProbe:
httpGet:
path: /api/health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
Monitoring
Application Metrics
Integrate with Prometheus:
pip install prometheus-fastapi-instrumentator
# backend/main.py
from prometheus_fastapi_instrumentator import Instrumentator
app = FastAPI()
Instrumentator().instrument(app).expose(app)
Metrics endpoint: GET /metrics
Error Tracking
Integrate Sentry:
pip install sentry-sdk[fastapi]
# backend/main.py
import sentry_sdk
sentry_sdk.init(
dsn="https://[email protected]/...",
environment="production",
traces_sample_rate=0.1,
)
Backup and Disaster Recovery
LanceDB Backups
# Daily backup script
#!/bin/bash
BACKUP_DIR="/backups/lancedb"
DATE=$(date +%Y%m%d)
tar -czf "$BACKUP_DIR/lancedb_$DATE.tar.gz" /var/lib/clinicalpilot/lancedb/
# Rotate old backups (keep 30 days)
find "$BACKUP_DIR" -name "lancedb_*.tar.gz" -mtime +30 -delete
Add to crontab:
0 2 * * * /opt/clinicalpilot/scripts/backup_lancedb.sh
Configuration Backups
Backup environment variables, secrets, and configuration:
# Encrypted backup
tar -czf config_backup.tar.gz .env backend/config.py
openssl enc -aes-256-cbc -salt -in config_backup.tar.gz -out config_backup.tar.gz.enc
rm config_backup.tar.gz
Scaling
Horizontal Scaling
Run multiple instances behind a load balancer:
- Stateless: FastAPI workers are stateless (no session storage)
- Shared LanceDB: Use network-attached storage (EFS, NFS, Ceph)
- Shared cache: Use Redis cluster for distributed caching
Vertical Scaling
Increase instance resources:
- RAM: For larger embedding models, more workers
- CPU: For faster LLM inference (local models)
- Storage: For larger vector stores
Troubleshooting
High Memory Usage
- Reduce
--workers count
- Use smaller embedding model
- Enable swap space (not recommended for production)
Slow API Responses
- Check OpenAI API status
- Enable LangSmith tracing to identify bottlenecks
- Add caching for RAG queries
- Use local LLM for non-critical agents
Database Corruption
If LanceDB becomes corrupted:
# Restore from backup
tar -xzf /backups/lancedb/lancedb_20260303.tar.gz -C /
# Re-initialize if backup unavailable
python -m backend.rag.lancedb_store --init
python -m backend.rag.lancedb_store --ingest /path/to/source/documents
Next Steps
Docker Deployment
Deploy ClinicalPilot with Docker Compose
HIPAA Compliance
Security and compliance requirements