Safety System

ClinicalPilot implements multiple layers of safety to prevent medical errors, protect patient privacy, and ensure reliable clinical outputs.

Safety Architecture

Defense in Depth: Multiple independent safety checks ensure errors are caught even if one layer fails.

Medical Error Prevention Panel

Purpose: Comprehensive medication safety review that runs on every case Source: backend/safety_panel/med_errors.py:49 Execution: Runs in parallel with the debate pipeline via asyncio.gather

Four Safety Domains

Drug-Drug Interactions

Check every medication pair for interactions (contraindicated, major, moderate, minor)

Drug-Disease Contraindications

Cross-reference each drug against patient conditions

Dosing Alerts

Renal/hepatic/weight/age-based dose adjustments

Population Flags

Pregnancy, pediatric, elderly, lactation concerns

Implementation

# backend/safety_panel/med_errors.py:49-104
async def run_med_error_panel(patient: PatientContext) -> MedErrorPanel:
    """
    Run the Medical Error Prevention Panel.
    This runs on EVERY case, not just drug-specific queries.
    """
    if not patient.medications:
        return MedErrorPanel(summary="No medications found in patient data.")
    
    # Build context
    meds = "\n".join(
        f"- {m.name} {m.dose} {m.frequency} ({m.route})".strip()
        for m in patient.medications
    )
    conditions = "\n".join(f"- {c.display}" for c in patient.conditions)
    demographics = f"Age: {patient.age}, Gender: {patient.gender.value}, Weight: {patient.weight_kg} kg"
    labs = "\n".join(
        f"- {l.name}: {l.value} {l.unit} (ref: {l.reference_range})"
        for l in patient.labs
    )
    allergies = ", ".join(a.substance for a in patient.allergies) or "NKDA"
    
    user_message = f"""## Medications
{meds}

## Conditions
{conditions}

## Demographics
{demographics}

## Labs
{labs}

## Allergies
{allergies}
"""
    
    result = await llm_call(
        system_prompt=MED_ERROR_SYSTEM_PROMPT,
        user_message=user_message,
        json_mode=True,
    )
    
    return _parse_panel(result["content"])

System Prompt

View Med Error System Prompt

# backend/safety_panel/med_errors.py:24-46
MED_ERROR_SYSTEM_PROMPT = """You are a clinical pharmacist AI performing a comprehensive medication safety review.

Given patient medications, conditions, demographics, and labs, perform ALL of these checks:

1. Drug-Drug Interactions — check every medication pair
2. Drug-Disease Contraindications — check each drug against each condition
3. Dosing Alerts — check for renal/hepatic/weight/age adjustments needed
4. Population Flags — check for pregnancy/pediatric/elderly/lactation concerns

For each finding, provide:
- The specific issue
- The mechanism in plain language
- A clear recommendation

Respond with JSON:
{
  "drug_interactions": [{"drug_a": "...", "drug_b": "...", "severity": "contraindicated|major|moderate|minor", "description": "...", "mechanism": "...", "recommendation": "..."}],
  "contraindications": [{"drug": "...", "disease": "...", "severity": "...", "description": "...", "recommendation": "..."}],
  "dosing_alerts": [{"drug": "...", "alert_type": "renal|hepatic|weight-based|age-based", "description": "...", "recommendation": "..."}],
  "population_flags": [{"drug": "...", "population": "pregnancy|pediatric|elderly|lactation", "category": "...", "description": "...", "recommendation": "..."}],
  "summary": "Overall safety assessment..."
}
"""

Output Schema

# backend/models/safety.py:10-49
class DrugInteraction(BaseModel):
    drug_a: str
    drug_b: str
    severity: str  # "contraindicated", "major", "moderate", "minor"
    description: str
    mechanism: str = ""
    recommendation: str = ""

class DrugDiseaseContraindication(BaseModel):
    drug: str
    disease: str
    severity: str
    description: str
    recommendation: str = ""

class DosingAlert(BaseModel):
    drug: str
    alert_type: str  # "renal", "hepatic", "weight-based", "age-based"
    description: str
    recommendation: str = ""

class PopulationFlag(BaseModel):
    drug: str
    population: str  # "pregnancy", "pediatric", "elderly", "lactation"
    category: str = ""  # e.g. FDA pregnancy category
    description: str
    recommendation: str = ""

class MedErrorPanel(BaseModel):
    drug_interactions: list[DrugInteraction] = Field(default_factory=list)
    contraindications: list[DrugDiseaseContraindication] = Field(default_factory=list)
    dosing_alerts: list[DosingAlert] = Field(default_factory=list)
    population_flags: list[PopulationFlag] = Field(default_factory=list)
    summary: str = ""

Example Output

{
  "drug_interactions": [
    {
      "drug_a": "Lisinopril",
      "drug_b": "Potassium Chloride",
      "severity": "major",
      "description": "Risk of hyperkalemia",
      "mechanism": "ACE inhibitors reduce potassium excretion; supplementation increases serum K+",
      "recommendation": "Monitor potassium levels closely. Consider reducing K+ dose or discontinuing if K+ >5.0 mEq/L"
    }
  ],
  "contraindications": [],
  "dosing_alerts": [
    {
      "drug": "Metformin",
      "alert_type": "renal",
      "description": "Requires dose adjustment if eGFR <45 mL/min",
      "recommendation": "Check renal function. Discontinue if eGFR <30 mL/min (lactic acidosis risk)"
    }
  ],
  "population_flags": [
    {
      "drug": "Lisinopril",
      "population": "elderly",
      "category": "Caution",
      "description": "Increased risk of orthostatic hypotension in elderly patients",
      "recommendation": "Monitor blood pressure standing/sitting. Start at low dose."
    }
  ],
  "summary": "1 major drug interaction, 1 dosing alert, 1 population flag identified. Close monitoring required."
}

Parallel Execution: The Med Error Panel runs simultaneously with the debate pipeline, adding zero latency to overall processing time.

PHI Anonymization

Purpose: Scrub Protected Health Information before LLM processing Technology: Microsoft Presidio + spaCy NLP Source: backend/input_layer/anonymizer.py:78

What Gets Anonymized

Direct Identifiers

Patient names → [PATIENT]
Phone numbers → [PHONE]
Email addresses → [EMAIL]
SSN → [SSN]
MRN → MRN: [REDACTED]

Dates (Targeted)

Date of Birth (DOB) → [DOB REDACTED]
Note: Generic DATE_TIME entity excluded to avoid false positives on med dosages

Locations (Filtered)

Geographic locations → [LOCATION]
Filtered: Clinical terms like “oral”, “IV”, “chest” are NOT anonymized

Other PII

Credit card numbers → [REDACTED]
IP addresses → [REDACTED]
Driver’s license → [REDACTED]

Implementation

# backend/input_layer/anonymizer.py:87-148
def _anonymize_presidio(self, text: str) -> str:
    """
    NOTE: We intentionally EXCLUDE DATE_TIME from Presidio entities
    because it causes false positives on medication dosages (e.g. '20mEq'
    gets misidentified as a date). We handle date anonymization via
    post-processing regex instead, which is more targeted.
    """
    entities = [
        "PERSON",
        "PHONE_NUMBER",
        "EMAIL_ADDRESS",
        "US_SSN",
        "CREDIT_CARD",
        "IP_ADDRESS",
        # "DATE_TIME",  # EXCLUDED: causes false positives on med dosages
        "LOCATION",
        "US_DRIVER_LICENSE",
        "MEDICAL_LICENSE",
        "URL",
    ]
    results = self._presidio_analyzer.analyze(
        text=text, entities=entities, language="en"
    )
    
    # Filter out LOCATION entities that look like clinical terms
    clinical_terms = {
        "oral", "iv", "im", "sq", "topical", "rectal", "nasal",
        "left", "right", "bilateral", "chest", "abdomen", "head",
    }
    results = [
        r for r in results
        if not (
            r.entity_type == "LOCATION"
            and text[r.start:r.end].strip().lower() in clinical_terms
        )
    ]
    
    anonymized = self._presidio_anonymizer.anonymize(
        text=text,
        analyzer_results=results,
        operators={
            "PERSON": OperatorConfig("replace", {"new_value": "[PATIENT]"}),
            "PHONE_NUMBER": OperatorConfig("replace", {"new_value": "[PHONE]"}),
            "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "[EMAIL]"}),
            "US_SSN": OperatorConfig("replace", {"new_value": "[SSN]"}),
            "LOCATION": OperatorConfig("replace", {"new_value": "[LOCATION]"}),
            "DEFAULT": OperatorConfig("replace", {"new_value": "[REDACTED]"}),
        },
    )
    
    # Post-process: scrub DOB patterns via regex
    result = anonymized.text
    result = re.sub(
        r"\b(?:DOB|Date of Birth|Birth\s?date)[:\s]*\d{1,2}[/\-]\d{1,2}[/\-]\d{2,4}\b",
        "[DOB REDACTED]",
        result,
        flags=re.I,
    )
    return result

Critical Fix: DATE_TIME entity is excluded from Presidio to prevent false positives on medication dosages (“20mEq” was being flagged as a date). DOB scrubbing uses targeted regex instead.

Clinical Term Protection

Presidio sometimes flags clinical terms as PHI. We filter these:

# backend/input_layer/anonymizer.py:113-124
clinical_terms = {
    "oral", "iv", "im", "sq", "topical", "rectal", "nasal",
    "left", "right", "bilateral", "chest", "abdomen", "head",
}
results = [
    r for r in results
    if not (
        r.entity_type == "LOCATION"
        and text[r.start:r.end].strip().lower() in clinical_terms
    )
]

Output Guardrails

Multiple validation rules prevent unsafe/incomplete outputs:

1. Pydantic Schema Validation

All models use strict typing:

# backend/models/soap.py
class Differential(BaseModel):
    diagnosis: str  # Required
    likelihood: str  # Required
    reasoning: str  # Required
    confidence: ConfidenceLevel  # Enum: high/medium/low
    supporting_evidence: list[str] = Field(default_factory=list)

class SOAPNote(BaseModel):
    subjective: str  # Required
    objective: str   # Required
    assessment: str  # Required
    plan: str        # Required
    differentials: list[Differential] = Field(default_factory=list)
    citations: list[str] = Field(default_factory=list)
    uncertainty: str = ""
    model_used: str = ""
    latency_ms: int = 0

Invalid data is rejected before output.

2. Minimum Differential Count

Implementation

# backend/validation/validator.py (conceptual)
def validate_output(soap: SOAPNote) -> SOAPNote:
    if len(soap.differentials) < 2:
        raise ValidationError(
            "SOAP note must include at least 2 differential diagnoses to avoid anchoring bias"
        )
    return soap

Purpose: Prevent anchoring bias by forcing consideration of multiple diagnoses.

3. No Hallucinated Medications

Cross-reference all medications against DrugBank vocabulary:

# backend/guardrails/rules.py (conceptual)
def check_medication_validity(soap: SOAPNote) -> list[str]:
    from backend.external.drugbank import get_drug_vocabulary
    
    valid_drugs = get_drug_vocabulary()
    hallucinations = []
    
    for drug in extract_medications_from_plan(soap.plan):
        if drug.lower() not in valid_drugs:
            hallucinations.append(drug)
    
    if hallucinations:
        raise ValidationError(f"Hallucinated medications detected: {hallucinations}")
    
    return []

Purpose: Prevent LLM from inventing non-existent drug names.

4. Safety Flag Enforcement

If Med Error Panel flags contraindicated interactions, they MUST appear in Plan:

# backend/validation/validator.py (conceptual)
def enforce_safety_flags(soap: SOAPNote, med_panel: MedErrorPanel) -> SOAPNote:
    contraindicated = [
        i for i in med_panel.drug_interactions 
        if i.severity == "contraindicated"
    ]
    
    for interaction in contraindicated:
        if interaction.description not in soap.plan:
            soap.plan += f"\n\n⚠ CRITICAL: {interaction.description} - {interaction.recommendation}"
    
    return soap

5. MUC Confidence Thresholds

Model Under Certainty (MUC) analysis flags low-confidence outputs:

# backend/guardrails/rules.py (conceptual)
def check_confidence(soap: SOAPNote) -> bool:
    low_confidence_ddx = [
        d for d in soap.differentials 
        if d.confidence == ConfidenceLevel.LOW
    ]
    
    if len(low_confidence_ddx) == len(soap.differentials):
        # All differentials are low confidence
        soap.uncertainty = "⚠ All differentials have low confidence. Human review recommended."
        return False
    
    return True

Human-in-the-Loop

When debate fails to reach consensus or confidence is low:

Flag for Review

state.flagged_for_human = True

UI displays: ⚠ Requires Human Review

Display Dissent Log

Show remaining issues:

"dissent_log": [
  "Clinical agent suggests AMI but troponin negative",
  "Literature agent found conflicting studies on statin dosing"
]

Doctor Edits SOAP

Frontend provides inline editing for all SOAP sections

Optional Re-Debate

POST /api/human-feedback
{
  "soap_edits": {...},
  "re_debate": true,
  "max_iterations": 1
}

Re-run debate with doctor’s edits as additional context

External Safety Data Sources

FDA openFDA

Drug labels, adverse events, interactions via REST API

DrugBank (Open Data)

Drug vocabulary, interactions CSV (free tier)

RxNorm API

Normalized drug names, RxCUI lookups

Parallel Lookups

All external APIs are called in parallel:

# backend/external/fda.py (conceptual)
async def check_drug_interactions(drug_names: list[str]) -> str:
    tasks = [fetch_fda_label(drug) for drug in drug_names]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    # ... format results ...

Speedup: 5 drug lookups in ~2s (parallel) vs ~10s (sequential)

Safety Checklist (Every Case)

✓ PHI Scrubbed

All inputs pass through Presidio before LLM processing

✓ Med Error Panel Run

Drug interactions, contraindications, dosing alerts checked

✓ Pydantic Validation

All outputs conform to strict schemas

✓ Minimum 2 Differentials

Prevents anchoring bias

✓ No Hallucinated Meds

Cross-referenced against DrugBank

✓ Safety Flags Enforced

Contraindicated interactions appear in Plan

✓ Confidence Thresholds

Low-confidence outputs flagged for human review

HIPAA Compliance Considerations

Production Deployment: ClinicalPilot requires additional safeguards for HIPAA compliance:

Encryption at rest for all patient data
Audit logging for all data access
BAA (Business Associate Agreement) with cloud provider
Access controls (role-based permissions)
Data retention policies (auto-delete PHI after X days)

Current anonymization is not sufficient for HIPAA compliance alone - it’s defense-in-depth.

Emergency Mode Safety

Even in fast-path mode, safety checks run:

# backend/emergency/emergency.py (conceptual)
async def run_emergency_mode(patient: PatientContext):
    # Run Clinical + Safety in parallel
    clinical, safety = await asyncio.gather(
        run_clinical_agent(patient),
        run_safety_agent(patient),
    )
    
    # Extract critical safety flags
    red_flags = [
        f for f in safety.flags 
        if f.severity in ["contraindicated", "major"]
    ]
    
    return EmergencyResponse(
        differentials=clinical.differentials[:3],
        red_flags=red_flags,  # ← Always included
        esi_score=_calculate_esi(patient),
    )

No shortcuts on safety - even sub-5s responses include drug interaction checks.

Next Steps

Architecture

Layer-by-layer system overview

Agent Types

Clinical, Literature, Safety, Critic agents

Multi-Agent Debate

How debate drives consensus and refinement

Get Started

Core Concepts

Guides

Safety Architecture

Medical Error Prevention Panel

Four Safety Domains

Drug-Drug Interactions

Drug-Disease Contraindications

Dosing Alerts

Population Flags

Implementation

System Prompt

Output Schema

Example Output

PHI Anonymization

What Gets Anonymized

Implementation

Clinical Term Protection

Output Guardrails

1. Pydantic Schema Validation

2. Minimum Differential Count

3. No Hallucinated Medications

4. Safety Flag Enforcement

5. MUC Confidence Thresholds

Human-in-the-Loop

External Safety Data Sources

FDA openFDA

DrugBank (Open Data)

RxNorm API

Parallel Lookups

Safety Checklist (Every Case)

HIPAA Compliance Considerations

Emergency Mode Safety

Next Steps

Architecture

Agent Types

Multi-Agent Debate

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Safety Architecture

​Medical Error Prevention Panel

​Four Safety Domains

Drug-Drug Interactions

Drug-Disease Contraindications

Dosing Alerts

Population Flags

​Implementation

​System Prompt

​Output Schema

​Example Output

​PHI Anonymization

​What Gets Anonymized

​Implementation

​Clinical Term Protection

​Output Guardrails

​1. Pydantic Schema Validation

​2. Minimum Differential Count

​3. No Hallucinated Medications

​4. Safety Flag Enforcement

​5. MUC Confidence Thresholds

​Human-in-the-Loop

​External Safety Data Sources

FDA openFDA

DrugBank (Open Data)

RxNorm API

​Parallel Lookups

​Safety Checklist (Every Case)

​HIPAA Compliance Considerations

​Emergency Mode Safety

​Next Steps

Architecture

Agent Types

Multi-Agent Debate

Build docs developers (and LLMs) love

Safety Architecture

Medical Error Prevention Panel

Four Safety Domains

Implementation

System Prompt

Output Schema

Example Output

PHI Anonymization

What Gets Anonymized

Implementation

Clinical Term Protection

Output Guardrails

1. Pydantic Schema Validation

2. Minimum Differential Count

3. No Hallucinated Medications

4. Safety Flag Enforcement

5. MUC Confidence Thresholds

Human-in-the-Loop

External Safety Data Sources

Parallel Lookups

Safety Checklist (Every Case)

HIPAA Compliance Considerations

Emergency Mode Safety

Next Steps