ClinicalPilot includes a comprehensive smoke test suite to validate all critical system components.
Smoke Test Suite
The _smoke_test.sh script runs end-to-end tests covering:
Health Check — API liveness
Full Analysis Pipeline — Complete debate workflow (~100s)
Emergency Mode — Fast-path triage (<5s)
Drug Safety Check — RxNorm/DrugBank/FDA integration
Classifiers — Medical imaging AI endpoints
Running Smoke Tests
Start the Application
python -m uvicorn backend.main:app --reload --port 8000
Run the Test Suite
In a separate terminal: Or make it executable: chmod +x _smoke_test.sh
./_smoke_test.sh
Review Results
Expected output: ============================================
ClinicalPilot Smoke Test Suite
============================================
=== TEST 1: Health Check ===
PASS: Health check OK
=== TEST 2: Full Analysis Pipeline ===
Sending request... (may take 30-60s)
Completed in 102s
SOAP subjective: 65-year-old male with HTN and Type 2 Diabetes...
SOAP objective: BP 145/92, HR 78, HbA1c 8.2, Creatinine 1.4...
SOAP assessment: Differential diagnoses include...
SOAP plan: Comprehensive treatment plan...
Differentials: 4
- Anemia (high, likely)
- Orthostatic Hypotension (medium, possible)
- CKD Progression (medium, possible)
- Medication Side Effects (low, less_likely)
Citations: 4
Safety flags: 2
- Lisinopril + Potassium: Risk of hyperkalemia...
- Metformin dosing: Reduce dose with elevated creatinine...
Debate rounds: 3
Consensus: true
Model: gpt-4o
Latency: 98234ms
Tokens: 47823
Drug interactions: 1
- Lisinopril x Potassium [major]
Contraindications: 0
Dosing alerts: 1
Population flags: 1
Summary: Multiple medication safety concerns identified...
PASS: Full analysis — all fields present
=== TEST 3: Emergency Mode ===
Completed in 3s
ESI Score: 1
Differentials: 3
- Acute Myocardial Infarction
- Cardiogenic Shock
- Aortic Dissection
Red flags: 4
Call to action: IMMEDIATE ACTIVATION: STEMI protocol, cardiology consult stat...
Safety flags: 2
Latency: 2876ms
PASS: Emergency mode OK
=== TEST 4: Drug Safety Check ===
RxNorm interactions: {...}
DrugBank: {...}
PASS: Safety check OK
=== TEST 5: Classifiers ===
Classifiers available: 4
PASS: Classifiers OK
============================================
ALL TESTS PASSED
============================================
Test Case Details
Test 1: Health Check
Validates:
FastAPI server is running
/api/health endpoint responds
Basic JSON parsing
CURL equivalent:
curl -sf http://localhost:8000/api/health | python3 -m json.tool
Test 2: Full Analysis Pipeline
Validates:
Input parsing : Text to PatientContext
PHI anonymization : Presidio scrubbing
All 4 agents : Clinical, Literature, Safety, Critic
Debate engine : 2-3 rounds with consensus
SOAP generation : All 4 sections populated
Differentials : ≥2 diagnoses with confidence scores
Citations : PubMed references included
Safety flags : Drug interaction warnings
Med Error Panel : Parallel execution
Token tracking : Total token count
Latency : End-to-end timing
Test input:
{
"text" : "65-year-old male with HTN and Type 2 Diabetes. Current medications: Metformin 1000mg BID, Lisinopril 20mg daily, Potassium 20mEq daily. Allergic to Penicillin. BP 145/92, HR 78. HbA1c 8.2, Creatinine 1.4. Presenting with increased fatigue and dizziness."
}
Expected:
✅ 4 SOAP sections (Subjective, Objective, Assessment, Plan)
✅ ≥2 differentials
✅ PubMed citations
✅ Safety flags (Lisinopril + Potassium hyperkalemia warning)
✅ Drug interactions detected
✅ Dosing alerts (Metformin with elevated creatinine)
This test takes ~100 seconds because it makes 14+ LLM calls (3 debate rounds × 4 agents + synthesis + validation). This is expected.
Test 3: Emergency Mode
Validates:
Fast path : <5 second response time
ESI scoring : Emergency Severity Index (1-5)
Top differentials : 3 most critical diagnoses
Red flags : Life-threatening indicators
Call to action : Immediate next steps
Safety flags : Critical drug warnings
Test input:
{
"text" : "55-year-old male with sudden onset chest pain, diaphoresis, BP 90/60, HR 130. PMH: DM2, CAD. On aspirin, metformin."
}
Expected:
✅ ESI Score: 1 (highest severity)
✅ Differentials include AMI, Cardiogenic Shock, Aortic Dissection
✅ Red flags: hypotension, tachycardia, chest pain
✅ Response time: <5s
Test 4: Drug Safety Check
Validates:
RxNorm API : Drug name resolution
DrugBank : Offline interaction lookup
FDA API : Label information
Parallel execution : All lookups run concurrently
Test input:
curl "http://localhost:8000/api/safety-check?drugs=metformin,lisinopril,potassium"
Test 5: Classifiers
Validates:
/api/classifiers endpoint
Returns 4 medical imaging classifiers:
Lung Disease
Chest X-ray Disease
Diabetic Retinopathy
Skin Cancer
Unit Tests (Future)
ClinicalPilot does not yet have unit tests for individual components. Contributions welcome!
Suggested test structure:
tests/
├── test_anonymizer.py # Presidio PHI scrubbing
├── test_parsers.py # FHIR/EHR/text parsing
├── test_agents.py # Individual agent outputs
├── test_debate.py # Debate engine logic
├── test_rag.py # LanceDB search
└── test_safety.py # Drug interaction detection
Example unit test:
# tests/test_anonymizer.py
import pytest
from backend.input_layer.anonymizer import anonymize_text
def test_anonymize_removes_names ():
text = "John Smith, age 45, MRN 123456"
result = anonymize_text(text)
assert "John Smith" not in result
assert "[NAME]" in result
def test_anonymize_removes_mrn ():
text = "MRN: 987654"
result = anonymize_text(text)
assert "987654" not in result
assert "[ID]" in result
def test_anonymize_preserves_medications ():
text = "Patient on Lisinopril 20mg daily"
result = anonymize_text(text)
assert "Lisinopril" in result
assert "20mg" in result
Running Unit Tests
pip install pytest pytest-asyncio
pytest tests/ -v
CI/CD Integration
Add to GitHub Actions:
# .github/workflows/test.yml
name : Tests
on : [ push , pull_request ]
jobs :
test :
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v3
- uses : actions/setup-python@v4
with :
python-version : '3.11'
- run : pip install -r requirements.txt
- run : pytest tests/ -v
- run : bash _smoke_test.sh # End-to-end tests
Expected latencies (with GPT-4o):
Test Target Actual Health Check <100ms ~50ms Full Analysis <120s ~100s Emergency Mode <5s ~3s Drug Safety Check <2s ~1s Classifiers <100ms ~40ms
Full analysis time depends on LLM API latency. During OpenAI API slowdowns, it may exceed 120s. This is expected.
Debugging Failed Tests
Full Analysis Timeout
If the analysis takes >180s:
Check OpenAI API status: https://status.openai.com
Verify your API key has sufficient quota
Enable debug logging: LOG_LEVEL=DEBUG in .env
Check for rate limits in LangSmith traces
Emergency Mode Slow
If emergency mode takes >5s:
Verify no debate rounds are running (should bypass debate)
Check that EMERGENCY_TIMEOUT_SEC=5 in .env
Disable RAG for emergency mode (future optimization)
Drug Safety Check Fails
If RxNorm/FDA APIs are down:
Check NCBI E-utilities status
Verify NCBI_API_KEY and NCBI_EMAIL in .env
Check FDA API status: https://open.fda.gov/status
Test with local DrugBank CSV fallback
Test Data
Sample test cases are in:
data/
├── sample_fhir/ # FHIR R4 bundles
│ ├── stemi_case.json
│ ├── stroke_case.json
│ └── pe_case.json
└── sample_ehr/ # CSV patient data
└── test_patients.csv
Use these for manual testing:
curl -X POST http://localhost:8000/api/upload/fhir \
-F "file=@data/sample_fhir/stemi_case.json"
Code Coverage (Future)
Install coverage tools:
pip install pytest-cov
pytest tests/ --cov=backend --cov-report=html
open htmlcov/index.html
Target: >80% code coverage for critical paths (anonymizer, agents, debate engine).
Next Steps
Production Deployment Deploy ClinicalPilot with CI/CD and automated testing
HIPAA Compliance Security and compliance requirements for production