Skip to main content
ClinicalPilot uses four specialized agents that work together through multi-round debate. Each agent has a distinct role, model configuration, and output schema.

Agent Overview

Clinical Agent

Generates differential diagnoses, risk scores, and SOAP draft using clinical reasoning

Literature Agent

Searches PubMed, Europe PMC, and RAG for evidence to support or refute clinical claims

Safety Agent

Checks drug interactions, contraindications, and dosing alerts via FDA/DrugBank APIs

Critic Agent

Reviews all outputs, identifies contradictions, gaps, and errors to drive consensus

Clinical Agent

Purpose: Generate differential diagnoses, risk scores, and initial SOAP draft Model: GPT-4o (or MedGemma-2-9b for local deployment) Source: backend/agents/clinical.py:18

Input Schema

async def run_clinical_agent(
    patient: PatientContext,  # Unified patient data
    critique: str = "",        # Optional feedback from Critic (Round 2+)
) -> ClinicalAgentOutput

Output Schema

class ClinicalAgentOutput(BaseModel):
    differentials: list[Differential] = Field(default_factory=list)
    risk_scores: dict[str, str] = Field(default_factory=dict)
    soap_draft: str = ""
    reasoning_trace: str = ""
    confidence: ConfidenceLevel = ConfidenceLevel.MEDIUM

class Differential(BaseModel):
    diagnosis: str
    likelihood: str  # "very likely", "possible", "unlikely"
    reasoning: str
    confidence: ConfidenceLevel  # high/medium/low
    supporting_evidence: list[str] = Field(default_factory=list)

Implementation

# backend/agents/clinical.py:18-50
async def run_clinical_agent(
    patient: PatientContext,
    critique: str = "",
) -> ClinicalAgentOutput:
    system_prompt = load_prompt("clinical_system.txt")
    
    user_message = f"""## Patient Context
{patient.to_clinical_summary()}

## Raw Clinical Text
{patient.raw_text or patient.current_prompt}
"""
    
    if critique:
        user_message += f"""
## Critique from Previous Round (address these issues)
{critique}
"""
    
    result = await llm_call(
        system_prompt=system_prompt,
        user_message=user_message,
        json_mode=True,  # Enforce structured JSON output
    )
    
    return _parse_output(result["content"])

System Prompt Highlights

You are an expert clinical reasoning AI assistant. Given a patient presentation,
you will:

1. Generate a differential diagnosis (at least 3-5 possibilities)
2. Calculate relevant risk scores (CHADS2, Wells, PERC, etc.)
3. Draft a SOAP note (Subjective, Objective, Assessment, Plan)
4. Show your reasoning trace (step-by-step clinical logic)

Use evidence-based medicine. Avoid anchoring bias. Consider zebras when
hoofbeats don't match horses.

Respond with JSON:
{
  "differentials": [
    {
      "diagnosis": "...",
      "likelihood": "very likely | possible | unlikely",
      "reasoning": "...",
      "confidence": "high | medium | low",
      "supporting_evidence": ["lab X shows Y", "symptom Z is classic for..."]
    }
  ],
  "risk_scores": {"CHADS2": "3 (moderate risk)", ...},
  "soap_draft": "S: ...\nO: ...\nA: ...\nP: ...",
  "reasoning_trace": "Step 1: ...",
  "confidence": "high | medium | low"
}

Example Output

{
  "differentials": [
    {
      "diagnosis": "Anemia (Iron Deficiency)",
      "likelihood": "very likely",
      "reasoning": "Fatigue, dizziness, Hgb 9.2 g/dL (low), MCV 72 fL (microcytic)",
      "confidence": "high",
      "supporting_evidence": [
        "Hemoglobin 9.2 g/dL (ref: 13-17)",
        "MCV 72 fL suggests iron deficiency",
        "Fatigue and dizziness common in anemia"
      ]
    },
    {
      "diagnosis": "Orthostatic Hypotension",
      "likelihood": "possible",
      "reasoning": "Dizziness on standing, on Lisinopril (ACE inhibitor)",
      "confidence": "medium",
      "supporting_evidence": [
        "Patient reports dizziness when standing",
        "Lisinopril can cause orthostatic hypotension"
      ]
    }
  ],
  "risk_scores": {},
  "soap_draft": "S: 68yo M presents with fatigue and dizziness...\nO: BP 110/70, HR 78...\nA: 1. Anemia 2. Orthostatic Hypotension\nP: CBC, iron studies, orthostatic vitals",
  "reasoning_trace": "Step 1: Note low Hgb → anemia. Step 2: Check MCV → microcytic suggests iron deficiency...",
  "confidence": "high"
}

Literature Agent

Purpose: Search medical literature for evidence to support/refute clinical claims Model: GPT-4o-mini (faster, cheaper for search tasks) Source: backend/agents/literature.py:20

Input Schema

async def run_literature_agent(
    patient: PatientContext,
    clinical_output: ClinicalAgentOutput | None = None,  # To search DDx
    critique: str = "",
) -> LiteratureAgentOutput

Output Schema

class LiteratureAgentOutput(BaseModel):
    evidence: list[LiteratureHit] = Field(default_factory=list)
    summary: str = ""
    contradictions: list[str] = Field(default_factory=list)
    confidence: ConfidenceLevel = ConfidenceLevel.MEDIUM

class LiteratureHit(BaseModel):
    title: str
    authors: str = ""
    journal: str = ""
    year: str = ""
    pmid: str = ""
    snippet: str = ""
    relevance: ConfidenceLevel = ConfidenceLevel.MEDIUM

Implementation

# backend/agents/literature.py:76-121
async def _search_pubmed(
    patient: PatientContext,
    clinical_output: ClinicalAgentOutput | None,
) -> str:
    from backend.external.pubmed import search_pubmed
    
    # Build queries from top differentials
    queries = []
    if clinical_output and clinical_output.differentials:
        for d in clinical_output.differentials[:3]:
            queries.append(d.diagnosis)
    
    # Also search primary condition
    if patient.conditions:
        queries.extend(c.display for c in patient.conditions[:2])
    
    if not queries:
        queries = [patient.current_prompt[:100]]
    
    results = []
    # Run all PubMed searches in PARALLEL
    tasks = [search_pubmed(q, max_results=3) for q in queries[:3]]
    all_hits = await asyncio.gather(*tasks, return_exceptions=True)
    
    for hits in all_hits:
        if isinstance(hits, list):
            results.extend(hits)
    
    # Format results
    formatted = []
    for r in results[:8]:  # Max 8 results
        formatted.append(
            f"- [{r.get('title', 'No title')}] "
            f"({r.get('journal', '')}, {r.get('year', '')}). "
            f"PMID: {r.get('pmid', 'N/A')}. "
            f"{r.get('abstract', '')[:200]}..."
        )
    return "\n".join(formatted)
Optimization: PubMed searches run 3 queries in parallel via asyncio.gather, reducing latency from ~9s to ~3s.

Example Output

{
  "evidence": [
    {
      "title": "Iron Deficiency Anemia in Elderly Patients",
      "authors": "Smith J, et al.",
      "journal": "JAMA Intern Med",
      "year": "2022",
      "pmid": "35123456",
      "snippet": "Microcytic anemia with MCV <80 fL is highly specific for iron deficiency...",
      "relevance": "high"
    }
  ],
  "summary": "Found strong evidence supporting iron deficiency anemia. No contradictions with clinical assessment.",
  "contradictions": [],
  "confidence": "high"
}

Safety Agent

Purpose: Check drug interactions, contraindications, and dosing alerts Model: GPT-4o Source: backend/agents/safety.py:17

Input Schema

async def run_safety_agent(
    patient: PatientContext,
    proposed_plan: str = "",  # Treatment plan to check
    critique: str = "",
) -> SafetyAgentOutput

Output Schema

class SafetyAgentOutput(BaseModel):
    flags: list[SafetyFlag] = Field(default_factory=list)
    medication_review: str = ""
    dosing_alerts: list[str] = Field(default_factory=list)
    population_warnings: list[str] = Field(default_factory=list)

class SafetyFlag(BaseModel):
    category: str  # "drug-drug", "drug-disease", "dosing", "population"
    severity: str  # "contraindicated", "major", "moderate", "minor"
    description: str
    mechanism: str = ""
    recommendation: str = ""
    drugs_involved: list[str] = Field(default_factory=list)

Implementation

# backend/agents/safety.py:65-94
async def _get_external_safety_data(patient: PatientContext) -> str:
    if not patient.medications:
        return ""
    
    results = []
    drug_names = [m.name for m in patient.medications]
    
    # Query FDA and DrugBank in PARALLEL
    try:
        from backend.external.fda import check_drug_interactions
        from backend.external.drugbank import lookup_interactions
        
        fda_results, db_results = await asyncio.gather(
            check_drug_interactions(drug_names),
            lookup_interactions(drug_names),
            return_exceptions=True,
        )
        
        if isinstance(fda_results, str) and fda_results:
            results.append(f"FDA Data:\n{fda_results}")
        
        if isinstance(db_results, str) and db_results:
            results.append(f"DrugBank Data:\n{db_results}")
    
    except Exception as e:
        logger.debug(f"External lookup failed: {e}")
    
    return "\n\n".join(results)

Example Output

{
  "flags": [
    {
      "category": "drug-drug",
      "severity": "major",
      "description": "Lisinopril + Potassium Chloride: Risk of hyperkalemia",
      "mechanism": "ACE inhibitors reduce potassium excretion; supplementation increases serum K+",
      "recommendation": "Monitor potassium levels closely. Consider reducing K+ dose or discontinuing if K+ >5.0 mEq/L",
      "drugs_involved": ["Lisinopril", "Potassium Chloride"]
    }
  ],
  "medication_review": "Patient on 5 medications. One major interaction identified. No contraindications.",
  "dosing_alerts": [
    "Metformin requires renal dose adjustment if eGFR <30 mL/min"
  ],
  "population_warnings": [
    "Lisinopril: Caution in elderly due to orthostatic hypotension risk"
  ]
}

Critic Agent

Purpose: Review all agent outputs, identify contradictions/gaps, drive consensus Model: GPT-4o Source: backend/agents/critic.py:22

Input Schema

async def run_critic_agent(
    patient: PatientContext,
    clinical: ClinicalAgentOutput,
    literature: LiteratureAgentOutput,
    safety: SafetyAgentOutput,
) -> CriticOutput

Output Schema

class CriticOutput(BaseModel):
    ehr_contradictions: list[str] = Field(default_factory=list)
    evidence_gaps: list[str] = Field(default_factory=list)
    safety_misses: list[str] = Field(default_factory=list)
    overall_assessment: str = ""
    consensus_reached: bool = False  # Key decision point
    dissent_log: list[str] = Field(default_factory=list)

Implementation

# backend/agents/critic.py:22-85
async def run_critic_agent(
    patient: PatientContext,
    clinical: ClinicalAgentOutput,
    literature: LiteratureAgentOutput,
    safety: SafetyAgentOutput,
) -> CriticOutput:
    system_prompt = load_prompt("critic_system.txt")
    
    # Format all agent outputs for review
    ddx_text = "\n".join(
        f"  {i+1}. {d.diagnosis} ({d.likelihood}, {d.confidence.value} confidence): {d.reasoning}"
        for i, d in enumerate(clinical.differentials)
    )
    
    evidence_text = "\n".join(
        f"  - {e.title} ({e.journal}, {e.year}): {e.snippet}"
        for e in literature.evidence[:5]
    )
    
    safety_text = "\n".join(
        f"  - [{f.severity.upper()}] {f.description}{f.recommendation}"
        for f in safety.flags
    )
    
    user_message = f"""## Original Patient Context
{patient.to_clinical_summary()}

## Clinical Agent Output
Differentials:
{ddx_text or "  (none provided)"}

Risk Scores: {clinical.risk_scores or "None calculated"}

SOAP Draft:
{clinical.soap_draft or "(none)"}

## Literature Agent Output
Evidence:
{evidence_text or "  (no evidence found)"}

Summary: {literature.summary}
Contradictions: {literature.contradictions or "None"}

## Safety Agent Output
Flags:
{safety_text or "  (no flags)"}

Medication Review: {safety.medication_review}
"""
    
    result = await llm_call(
        system_prompt=system_prompt,
        user_message=user_message,
        json_mode=True,
    )
    
    return _parse_output(result["content"])

System Prompt Highlights

You are a clinical QA reviewer. Your job is to critically evaluate outputs from
three agents (Clinical, Literature, Safety) and determine if they form a coherent,
evidence-backed, safe clinical assessment.

Check for:
1. EHR Contradictions - Does the DDx conflict with patient data?
2. Evidence Gaps - Are clinical claims unsupported by literature?
3. Safety Misses - Were all drug interactions caught?
4. Logical Coherence - Do the agents agree?

Set consensus_reached = true ONLY if:
- No EHR contradictions
- No major evidence gaps
- No missed safety issues
- No significant dissent between agents

If consensus is NOT reached, provide specific, actionable feedback for each agent.

Example Output (No Consensus)

{
  "ehr_contradictions": [
    "DDx includes 'CKD' but patient creatinine is normal (0.9 mg/dL, eGFR >60)"
  ],
  "evidence_gaps": [
    "Claim that 'orthostatic hypotension is common in elderly' lacks citation"
  ],
  "safety_misses": [
    "Lisinopril + Potassium interaction not flagged by Safety Agent"
  ],
  "overall_assessment": "Strong clinical reasoning but needs better lab integration. Safety review incomplete.",
  "consensus_reached": false,
  "dissent_log": [
    "Clinical agent suggests CKD but labs contradict",
    "Safety agent did not check all medication pairs"
  ]
}

Agent Communication Flow


Model Selection Rationale

AgentModelReasoning
ClinicalGPT-4oRequires deep clinical reasoning, differential generation
LiteratureGPT-4o-miniSearch/summarization task, speed/cost optimized
SafetyGPT-4oSafety-critical, needs accuracy over speed
CriticGPT-4oMeta-reasoning, complex evaluation
Local LLM Option: Set USE_LOCAL_LLM=true and OLLAMA_MODEL=medgemma2:9b to use MedGemma instead of OpenAI. Requires Ollama installed.

Next Steps

Multi-Agent Debate

How agents work together through debate rounds

Safety System

Medical Error Prevention Panel implementation

Architecture

Layer-by-layer system overview

Build docs developers (and LLMs) love