Skip to main content
ClinicalPilot includes a comprehensive smoke test suite to validate all critical system components.

Smoke Test Suite

The _smoke_test.sh script runs end-to-end tests covering:
  1. Health Check — API liveness
  2. Full Analysis Pipeline — Complete debate workflow (~100s)
  3. Emergency Mode — Fast-path triage (<5s)
  4. Drug Safety Check — RxNorm/DrugBank/FDA integration
  5. Classifiers — Medical imaging AI endpoints

Running Smoke Tests

1

Start the Application

python -m uvicorn backend.main:app --reload --port 8000
2

Run the Test Suite

In a separate terminal:
bash _smoke_test.sh
Or make it executable:
chmod +x _smoke_test.sh
./_smoke_test.sh
3

Review Results

Expected output:
============================================
 ClinicalPilot Smoke Test Suite
============================================

=== TEST 1: Health Check ===
PASS: Health check OK

=== TEST 2: Full Analysis Pipeline ===
Sending request... (may take 30-60s)
Completed in 102s
  SOAP subjective: 65-year-old male with HTN and Type 2 Diabetes...
  SOAP objective: BP 145/92, HR 78, HbA1c 8.2, Creatinine 1.4...
  SOAP assessment: Differential diagnoses include...
  SOAP plan: Comprehensive treatment plan...
  Differentials: 4
    - Anemia (high, likely)
    - Orthostatic Hypotension (medium, possible)
    - CKD Progression (medium, possible)
    - Medication Side Effects (low, less_likely)
  Citations: 4
  Safety flags: 2
    - Lisinopril + Potassium: Risk of hyperkalemia...
    - Metformin dosing: Reduce dose with elevated creatinine...
  Debate rounds: 3
  Consensus: true
  Model: gpt-4o
  Latency: 98234ms
  Tokens: 47823
  Drug interactions: 1
    - Lisinopril x Potassium [major]
  Contraindications: 0
  Dosing alerts: 1
  Population flags: 1
  Summary: Multiple medication safety concerns identified...

PASS: Full analysis — all fields present

=== TEST 3: Emergency Mode ===
Completed in 3s
  ESI Score: 1
  Differentials: 3
    - Acute Myocardial Infarction
    - Cardiogenic Shock
    - Aortic Dissection
  Red flags: 4
  Call to action: IMMEDIATE ACTIVATION: STEMI protocol, cardiology consult stat...
  Safety flags: 2
  Latency: 2876ms
PASS: Emergency mode OK

=== TEST 4: Drug Safety Check ===
  RxNorm interactions: {...}
  DrugBank: {...}
PASS: Safety check OK

=== TEST 5: Classifiers ===
  Classifiers available: 4
PASS: Classifiers OK

============================================
 ALL TESTS PASSED
============================================

Test Case Details

Test 1: Health Check

Validates:
  • FastAPI server is running
  • /api/health endpoint responds
  • Basic JSON parsing
CURL equivalent:
curl -sf http://localhost:8000/api/health | python3 -m json.tool

Test 2: Full Analysis Pipeline

Validates:
  • Input parsing: Text to PatientContext
  • PHI anonymization: Presidio scrubbing
  • All 4 agents: Clinical, Literature, Safety, Critic
  • Debate engine: 2-3 rounds with consensus
  • SOAP generation: All 4 sections populated
  • Differentials: ≥2 diagnoses with confidence scores
  • Citations: PubMed references included
  • Safety flags: Drug interaction warnings
  • Med Error Panel: Parallel execution
  • Token tracking: Total token count
  • Latency: End-to-end timing
Test input:
{
  "text": "65-year-old male with HTN and Type 2 Diabetes. Current medications: Metformin 1000mg BID, Lisinopril 20mg daily, Potassium 20mEq daily. Allergic to Penicillin. BP 145/92, HR 78. HbA1c 8.2, Creatinine 1.4. Presenting with increased fatigue and dizziness."
}
Expected:
  • ✅ 4 SOAP sections (Subjective, Objective, Assessment, Plan)
  • ✅ ≥2 differentials
  • ✅ PubMed citations
  • ✅ Safety flags (Lisinopril + Potassium hyperkalemia warning)
  • ✅ Drug interactions detected
  • ✅ Dosing alerts (Metformin with elevated creatinine)
This test takes ~100 seconds because it makes 14+ LLM calls (3 debate rounds × 4 agents + synthesis + validation). This is expected.

Test 3: Emergency Mode

Validates:
  • Fast path: <5 second response time
  • ESI scoring: Emergency Severity Index (1-5)
  • Top differentials: 3 most critical diagnoses
  • Red flags: Life-threatening indicators
  • Call to action: Immediate next steps
  • Safety flags: Critical drug warnings
Test input:
{
  "text": "55-year-old male with sudden onset chest pain, diaphoresis, BP 90/60, HR 130. PMH: DM2, CAD. On aspirin, metformin."
}
Expected:
  • ✅ ESI Score: 1 (highest severity)
  • ✅ Differentials include AMI, Cardiogenic Shock, Aortic Dissection
  • ✅ Red flags: hypotension, tachycardia, chest pain
  • ✅ Response time: <5s

Test 4: Drug Safety Check

Validates:
  • RxNorm API: Drug name resolution
  • DrugBank: Offline interaction lookup
  • FDA API: Label information
  • Parallel execution: All lookups run concurrently
Test input:
curl "http://localhost:8000/api/safety-check?drugs=metformin,lisinopril,potassium"

Test 5: Classifiers

Validates:
  • /api/classifiers endpoint
  • Returns 4 medical imaging classifiers:
    • Lung Disease
    • Chest X-ray Disease
    • Diabetic Retinopathy
    • Skin Cancer

Unit Tests (Future)

ClinicalPilot does not yet have unit tests for individual components. Contributions welcome! Suggested test structure:
tests/
├── test_anonymizer.py       # Presidio PHI scrubbing
├── test_parsers.py          # FHIR/EHR/text parsing
├── test_agents.py           # Individual agent outputs
├── test_debate.py           # Debate engine logic
├── test_rag.py              # LanceDB search
└── test_safety.py           # Drug interaction detection
Example unit test:
# tests/test_anonymizer.py
import pytest
from backend.input_layer.anonymizer import anonymize_text

def test_anonymize_removes_names():
    text = "John Smith, age 45, MRN 123456"
    result = anonymize_text(text)
    assert "John Smith" not in result
    assert "[NAME]" in result

def test_anonymize_removes_mrn():
    text = "MRN: 987654"
    result = anonymize_text(text)
    assert "987654" not in result
    assert "[ID]" in result

def test_anonymize_preserves_medications():
    text = "Patient on Lisinopril 20mg daily"
    result = anonymize_text(text)
    assert "Lisinopril" in result
    assert "20mg" in result

Running Unit Tests

pip install pytest pytest-asyncio
pytest tests/ -v

CI/CD Integration

Add to GitHub Actions:
# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v
      - run: bash _smoke_test.sh  # End-to-end tests

Performance Benchmarks

Expected latencies (with GPT-4o):
TestTargetActual
Health Check<100ms~50ms
Full Analysis<120s~100s
Emergency Mode<5s~3s
Drug Safety Check<2s~1s
Classifiers<100ms~40ms
Full analysis time depends on LLM API latency. During OpenAI API slowdowns, it may exceed 120s. This is expected.

Debugging Failed Tests

Full Analysis Timeout

If the analysis takes >180s:
  1. Check OpenAI API status: https://status.openai.com
  2. Verify your API key has sufficient quota
  3. Enable debug logging: LOG_LEVEL=DEBUG in .env
  4. Check for rate limits in LangSmith traces

Emergency Mode Slow

If emergency mode takes >5s:
  1. Verify no debate rounds are running (should bypass debate)
  2. Check that EMERGENCY_TIMEOUT_SEC=5 in .env
  3. Disable RAG for emergency mode (future optimization)

Drug Safety Check Fails

If RxNorm/FDA APIs are down:
  1. Check NCBI E-utilities status
  2. Verify NCBI_API_KEY and NCBI_EMAIL in .env
  3. Check FDA API status: https://open.fda.gov/status
  4. Test with local DrugBank CSV fallback

Test Data

Sample test cases are in:
data/
├── sample_fhir/           # FHIR R4 bundles
│   ├── stemi_case.json
│   ├── stroke_case.json
│   └── pe_case.json
└── sample_ehr/            # CSV patient data
    └── test_patients.csv
Use these for manual testing:
curl -X POST http://localhost:8000/api/upload/fhir \
  -F "file=@data/sample_fhir/stemi_case.json"

Code Coverage (Future)

Install coverage tools:
pip install pytest-cov
pytest tests/ --cov=backend --cov-report=html
open htmlcov/index.html
Target: >80% code coverage for critical paths (anonymizer, agents, debate engine).

Next Steps

Production Deployment

Deploy ClinicalPilot with CI/CD and automated testing

HIPAA Compliance

Security and compliance requirements for production

Build docs developers (and LLMs) love