Testing

ClinicalPilot includes a comprehensive smoke test suite to validate all critical system components.

Smoke Test Suite

The _smoke_test.sh script runs end-to-end tests covering:

Health Check — API liveness
Full Analysis Pipeline — Complete debate workflow (~100s)
Emergency Mode — Fast-path triage (<5s)
Drug Safety Check — RxNorm/DrugBank/FDA integration
Classifiers — Medical imaging AI endpoints

Running Smoke Tests

Start the Application

python -m uvicorn backend.main:app --reload --port 8000

Run the Test Suite

In a separate terminal:

bash _smoke_test.sh

Or make it executable:

chmod +x _smoke_test.sh
./_smoke_test.sh

Review Results

Expected output:

============================================
 ClinicalPilot Smoke Test Suite
============================================

=== TEST 1: Health Check ===
PASS: Health check OK

=== TEST 2: Full Analysis Pipeline ===
Sending request... (may take 30-60s)
Completed in 102s
  SOAP subjective: 65-year-old male with HTN and Type 2 Diabetes...
  SOAP objective: BP 145/92, HR 78, HbA1c 8.2, Creatinine 1.4...
  SOAP assessment: Differential diagnoses include...
  SOAP plan: Comprehensive treatment plan...
  Differentials: 4
    - Anemia (high, likely)
    - Orthostatic Hypotension (medium, possible)
    - CKD Progression (medium, possible)
    - Medication Side Effects (low, less_likely)
  Citations: 4
  Safety flags: 2
    - Lisinopril + Potassium: Risk of hyperkalemia...
    - Metformin dosing: Reduce dose with elevated creatinine...
  Debate rounds: 3
  Consensus: true
  Model: gpt-4o
  Latency: 98234ms
  Tokens: 47823
  Drug interactions: 1
    - Lisinopril x Potassium [major]
  Contraindications: 0
  Dosing alerts: 1
  Population flags: 1
  Summary: Multiple medication safety concerns identified...

PASS: Full analysis — all fields present

=== TEST 3: Emergency Mode ===
Completed in 3s
  ESI Score: 1
  Differentials: 3
    - Acute Myocardial Infarction
    - Cardiogenic Shock
    - Aortic Dissection
  Red flags: 4
  Call to action: IMMEDIATE ACTIVATION: STEMI protocol, cardiology consult stat...
  Safety flags: 2
  Latency: 2876ms
PASS: Emergency mode OK

=== TEST 4: Drug Safety Check ===
  RxNorm interactions: {...}
  DrugBank: {...}
PASS: Safety check OK

=== TEST 5: Classifiers ===
  Classifiers available: 4
PASS: Classifiers OK

============================================
 ALL TESTS PASSED
============================================

Test Case Details

Test 1: Health Check

Validates:

FastAPI server is running
/api/health endpoint responds
Basic JSON parsing

CURL equivalent:

curl -sf http://localhost:8000/api/health | python3 -m json.tool

Test 2: Full Analysis Pipeline

Validates:

Input parsing: Text to PatientContext
PHI anonymization: Presidio scrubbing
All 4 agents: Clinical, Literature, Safety, Critic
Debate engine: 2-3 rounds with consensus
SOAP generation: All 4 sections populated
Differentials: ≥2 diagnoses with confidence scores
Citations: PubMed references included
Safety flags: Drug interaction warnings
Med Error Panel: Parallel execution
Token tracking: Total token count
Latency: End-to-end timing

Test input:

{
  "text": "65-year-old male with HTN and Type 2 Diabetes. Current medications: Metformin 1000mg BID, Lisinopril 20mg daily, Potassium 20mEq daily. Allergic to Penicillin. BP 145/92, HR 78. HbA1c 8.2, Creatinine 1.4. Presenting with increased fatigue and dizziness."
}

Expected:

✅ 4 SOAP sections (Subjective, Objective, Assessment, Plan)
✅ ≥2 differentials
✅ PubMed citations
✅ Safety flags (Lisinopril + Potassium hyperkalemia warning)
✅ Drug interactions detected
✅ Dosing alerts (Metformin with elevated creatinine)

This test takes ~100 seconds because it makes 14+ LLM calls (3 debate rounds × 4 agents + synthesis + validation). This is expected.

Test 3: Emergency Mode

Validates:

Fast path: <5 second response time
ESI scoring: Emergency Severity Index (1-5)
Top differentials: 3 most critical diagnoses
Red flags: Life-threatening indicators
Call to action: Immediate next steps
Safety flags: Critical drug warnings

Test input:

{
  "text": "55-year-old male with sudden onset chest pain, diaphoresis, BP 90/60, HR 130. PMH: DM2, CAD. On aspirin, metformin."
}

Expected:

✅ ESI Score: 1 (highest severity)
✅ Differentials include AMI, Cardiogenic Shock, Aortic Dissection
✅ Red flags: hypotension, tachycardia, chest pain
✅ Response time: <5s

Test 4: Drug Safety Check

Validates:

RxNorm API: Drug name resolution
DrugBank: Offline interaction lookup
FDA API: Label information
Parallel execution: All lookups run concurrently

Test input:

curl "http://localhost:8000/api/safety-check?drugs=metformin,lisinopril,potassium"

Test 5: Classifiers

Validates:

/api/classifiers endpoint
Returns 4 medical imaging classifiers:
- Lung Disease
- Chest X-ray Disease
- Diabetic Retinopathy
- Skin Cancer

Unit Tests (Future)

ClinicalPilot does not yet have unit tests for individual components. Contributions welcome! Suggested test structure:

tests/
├── test_anonymizer.py       # Presidio PHI scrubbing
├── test_parsers.py          # FHIR/EHR/text parsing
├── test_agents.py           # Individual agent outputs
├── test_debate.py           # Debate engine logic
├── test_rag.py              # LanceDB search
└── test_safety.py           # Drug interaction detection

Example unit test:

# tests/test_anonymizer.py
import pytest
from backend.input_layer.anonymizer import anonymize_text

def test_anonymize_removes_names():
    text = "John Smith, age 45, MRN 123456"
    result = anonymize_text(text)
    assert "John Smith" not in result
    assert "[NAME]" in result

def test_anonymize_removes_mrn():
    text = "MRN: 987654"
    result = anonymize_text(text)
    assert "987654" not in result
    assert "[ID]" in result

def test_anonymize_preserves_medications():
    text = "Patient on Lisinopril 20mg daily"
    result = anonymize_text(text)
    assert "Lisinopril" in result
    assert "20mg" in result

Running Unit Tests

pip install pytest pytest-asyncio
pytest tests/ -v

CI/CD Integration

Add to GitHub Actions:

# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v
      - run: bash _smoke_test.sh  # End-to-end tests

Performance Benchmarks

Expected latencies (with GPT-4o):

Test	Target	Actual
Health Check	<100ms	~50ms
Full Analysis	<120s	~100s
Emergency Mode	<5s	~3s
Drug Safety Check	<2s	~1s
Classifiers	<100ms	~40ms

Full analysis time depends on LLM API latency. During OpenAI API slowdowns, it may exceed 120s. This is expected.

Debugging Failed Tests

Full Analysis Timeout

If the analysis takes >180s:

Check OpenAI API status: https://status.openai.com
Verify your API key has sufficient quota
Enable debug logging: LOG_LEVEL=DEBUG in .env
Check for rate limits in LangSmith traces

Emergency Mode Slow

If emergency mode takes >5s:

Verify no debate rounds are running (should bypass debate)
Check that EMERGENCY_TIMEOUT_SEC=5 in .env
Disable RAG for emergency mode (future optimization)

Drug Safety Check Fails

If RxNorm/FDA APIs are down:

Check NCBI E-utilities status
Verify NCBI_API_KEY and NCBI_EMAIL in .env
Check FDA API status: https://open.fda.gov/status
Test with local DrugBank CSV fallback

Test Data

Sample test cases are in:

data/
├── sample_fhir/           # FHIR R4 bundles
│   ├── stemi_case.json
│   ├── stroke_case.json
│   └── pe_case.json
└── sample_ehr/            # CSV patient data
    └── test_patients.csv

Use these for manual testing:

curl -X POST http://localhost:8000/api/upload/fhir \
  -F "file=@data/sample_fhir/stemi_case.json"

Code Coverage (Future)

Install coverage tools:

pip install pytest-cov
pytest tests/ --cov=backend --cov-report=html
open htmlcov/index.html

Target: >80% code coverage for critical paths (anonymizer, agents, debate engine).

Advanced

Deployment

Smoke Test Suite

Running Smoke Tests

Test Case Details

Test 1: Health Check

Test 2: Full Analysis Pipeline

Test 3: Emergency Mode

Test 4: Drug Safety Check

Test 5: Classifiers

Unit Tests (Future)

Running Unit Tests

CI/CD Integration

Performance Benchmarks

Debugging Failed Tests

Full Analysis Timeout

Emergency Mode Slow

Drug Safety Check Fails

Test Data

Code Coverage (Future)

Next Steps

Production Deployment

HIPAA Compliance

Build docs developers (and LLMs) love

Advanced

Deployment

​Smoke Test Suite

​Running Smoke Tests

​Test Case Details

​Test 1: Health Check

​Test 2: Full Analysis Pipeline

​Test 3: Emergency Mode

​Test 4: Drug Safety Check

​Test 5: Classifiers

​Unit Tests (Future)

​Running Unit Tests

​CI/CD Integration

​Performance Benchmarks

​Debugging Failed Tests

​Full Analysis Timeout

​Emergency Mode Slow

​Drug Safety Check Fails

​Test Data

​Code Coverage (Future)

​Next Steps

Production Deployment

HIPAA Compliance

Build docs developers (and LLMs) love

Smoke Test Suite

Running Smoke Tests

Test Case Details

Test 1: Health Check

Test 2: Full Analysis Pipeline

Test 3: Emergency Mode

Test 4: Drug Safety Check

Test 5: Classifiers

Unit Tests (Future)

Running Unit Tests

CI/CD Integration

Performance Benchmarks

Debugging Failed Tests

Full Analysis Timeout

Emergency Mode Slow

Drug Safety Check Fails

Test Data

Code Coverage (Future)

Next Steps