Prerequisites
Python 3.10+
Requiredpython.org or
brew install [email protected]Git 2.x+
Required
brew install git or git-scm.comOllama
OptionalOnly for local LLM (MedGemma)
Quick Install (5 Minutes)
Install SpaCy Model
The PHI anonymization layer requires a spaCy language model. Install directly via pip:
For a smaller model (~12MB vs 560MB), use
en_core_web_sm and set SPACY_MODEL=en_core_web_sm in .envRun the Server
Detailed Dependencies
Core Framework
LLM & Agents
Vector DB & Embeddings
LanceDB is embedded/serverless — no separate database server required. It auto-creates at
data/lancedb/ on first run.Document Parsing
PHI Anonymization
External APIs
Observability
API Keys Setup
OpenAI (Required)
Get API Key
- Go to platform.openai.com/api-keys
- Create a new API key
- Copy the key (starts with
sk-)
Groq (Required for AI Chat)
Get API Key
- Go to console.groq.com/keys
- Create a free API key
- Copy the key (starts with
gsk_)
PubMed / NCBI (Recommended)
Register
- Go to ncbi.nlm.nih.gov/account
- Create a free account
- Navigate to Settings → API Key Management → Create API Key
LangSmith (Optional — Observability)
Sign Up
- Go to smith.langchain.com
- Sign up (free tier available)
- Create an API key
FDA API (Optional)
Request Key
- Go to open.fda.gov/apis/authentication
- Request an API key (instant approval)
Data Downloads
DrugBank Open Data (Optional)
Download
- Go to go.drugbank.com/releases/latest#open-data
- Create a free account
- Download DrugBank Vocabulary CSV
Sample FHIR Data (Included)
The repo includes sample FHIR R4 bundles:Local LLM Setup (Optional — MedGemma via Ollama)
LanceDB Initialization
LanceDB is serverless/embedded — no separate server needed.Auto-Creation
The vector store auto-creates at
data/lancedb/ on first run. No manual setup required.Running the Application
Development Mode
Production Mode
Docker (Optional)
Verification Checklist
After installation, verify everything works:Check API Docs
Open http://localhost:8000/docs for interactive API documentation.
Troubleshooting
ModuleNotFoundError: No module named 'backend'
ModuleNotFoundError: No module named 'backend'
Ensure you’re running from the project root:
Presidio / SpaCy Errors
Presidio / SpaCy Errors
Force reinstall and use direct pip install for the model:If issues persist, use the smaller model:
macOS SSL Certificate Errors
macOS SSL Certificate Errors
The app auto-patches SSL using certifi. If you still see
CERTIFICATE_VERIFY_FAILED:LanceDB Import Errors
LanceDB Import Errors
OPENAI_API_KEY not set
OPENAI_API_KEY not set
Port 8000 Already in Use
Port 8000 Already in Use
Find and kill the process:Or use a different port:
Ollama Connection Refused
Ollama Connection Refused
Make sure Ollama is running:
Platform-Specific Notes
- macOS (Apple Silicon)
- macOS (Intel)
- Linux
- Windows
macOS M1/M2/M3
- All dependencies work natively on Apple Silicon
- Ollama runs natively with Metal acceleration
- SpaCy large model may take a few minutes to download
- SSL certificate fix: The app auto-patches using
certifi, but you can also run:
Environment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY | Yes | - | OpenAI API key for GPT-4o/4o-mini |
OPENAI_MODEL | No | gpt-4o | Model for main agents |
OPENAI_FAST_MODEL | No | gpt-4o-mini | Model for Literature agent |
GROQ_API_KEY | No* | - | Groq API key for AI Chat |
GROQ_MODEL | No | llama-3.3-70b-versatile | Groq model name |
USE_LOCAL_LLM | No | false | Use Ollama/MedGemma |
OLLAMA_BASE_URL | No | http://localhost:11434 | Ollama server URL |
OLLAMA_MODEL | No | medgemma2:9b | Ollama model name |
NCBI_EMAIL | Yes** | - | Email for PubMed API |
NCBI_API_KEY | No | - | PubMed API key (10 req/s vs 3) |
LANGSMITH_API_KEY | No | - | LangSmith tracing key |
LANGCHAIN_TRACING_V2 | No | false | Enable LangSmith tracing |
LANGCHAIN_PROJECT | No | clinicalpilot | LangSmith project name |
FDA_API_KEY | No | - | FDA API key (higher limits) |
LANCEDB_PATH | No | data/lancedb | Vector store directory |
DRUGBANK_CSV_PATH | No | data/drugbank/drugbank_vocabulary.csv | DrugBank data file |
SPACY_MODEL | No | en_core_web_lg | SpaCy model for PHI detection |
LOG_LEVEL | No | INFO | Logging level |
CORS_ORIGINS | No | ["*"] | CORS allowed origins |
EMERGENCY_TIMEOUT_SEC | No | 5 | Emergency mode timeout |
MAX_DEBATE_ROUNDS | No | 3 | Max debate iterations |
* Required only for AI Chat feature** Required by NCBI for E-utilities access
Updating
To update ClinicalPilot:Next Steps
Quickstart
Run your first SOAP note analysis in 5 minutes
Architecture
Learn how the multi-agent debate system works
API Reference
Explore all endpoints and schemas
Configuration
Advanced settings and customization