High-Level Overview
Processing Time: The full pipeline takes ~100 seconds for complex cases due to 14+ LLM calls across 3 debate rounds. Emergency mode bypasses debate for <5s response.
Layer 1: Input Gateway
Purpose: Accept clinical data from multiple sources and scrub PHIFHIR API
HL7 FHIR R4 JSON bundles parsed to extract Patient, Condition, Observation, MedicationRequest resources
EHR Upload
PDF/CSV documents parsed via PyPDF2 and Unstructured.io
Free Text
Direct text input or future voice (Whisper STT)
PHI Anonymization
All inputs pass through Microsoft Presidio for PHI scrubbing:DATE_TIME entity is intentionally excluded from Presidio to prevent false positives on medication dosages like “20mEq” being misidentified as dates. DOB scrubbing uses targeted regex instead.Layer 2: Processing & Parsing
Purpose: Convert diverse inputs into a unifiedPatientContext schema
Parse Input
Route to appropriate parser based on input type:
fhir_parser.py- FHIR R4 bundlesehr_parser.py- PDF/CSV uploadstext_parser.py- Free text normalization
Layer 3: Agent Layer
Four specialized agents generate outputs in parallel (Round 1) or incorporate critique (Round 2+):| Agent | Model | Purpose | Output | Source |
|---|---|---|---|---|
| Clinical | GPT-4o / MedGemma-2-9b | Generate differential diagnoses, risk scores, SOAP draft | ClinicalAgentOutput | backend/agents/clinical.py:18 |
| Literature | GPT-4o-mini | Search PubMed, Europe PMC, LanceDB RAG for evidence | LiteratureAgentOutput | backend/agents/literature.py:20 |
| Safety | GPT-4o | Check drug interactions, contraindications, dosing alerts | SafetyAgentOutput | backend/agents/safety.py:17 |
| Critic | GPT-4o | Review all outputs, identify contradictions, gaps, errors | CriticOutput | backend/agents/critic.py:22 |
Agent Execution Flow (Round 1)
Agent Execution Flow (Round 1)
External API Integration
Agents query external medical databases in parallel:Layer 4: Debate Engine
Purpose: Multi-round iterative refinement with critic feedbackRound 1: Independent Generation
All agents generate outputs independently (Clinical → Literature & Safety in parallel)
Critic Review
Critic agent reviews all outputs, identifies:
- EHR contradictions
- Evidence gaps
- Safety misses
- Points of dissent
Consensus Check
After each round, check if
consensus_reached == True. If yes, end debate. If not, continue to next round (max 3).Fixed Rounds: Debate is deterministic (2-3 rounds max) to avoid infinite loops. This design prioritizes speed and predictability over exhaustive consensus.
Layer 0: Medical Error Prevention Panel
Runs in parallel with the debate pipeline viaasyncio.gather:
Drug-Drug Interactions
Check every medication pair for interactions (contraindicated, major, moderate, minor)
Drug-Disease Contraindications
Cross-reference each drug against patient conditions
Dosing Alerts
Renal/hepatic/weight/age-based adjustments
Population Flags
Pregnancy, pediatric, elderly, lactation concerns
Layer 5: Validation & Output
Validate Completeness
validator.py checks:- All 4 SOAP sections populated (Subjective, Objective, Assessment, Plan)
- ≥2 differential diagnoses
- Safety flags explicitly addressed
- No hallucinated medications (cross-ref DrugBank)
Layer 6: Observability
LangSmith / Langfuse tracing captures:- Every LLM call (model, tokens, latency)
- Agent inputs/outputs
- Debate round progression
- External API calls (PubMed, FDA, RxNorm)
Layer 7: Guardrails
Multiple validation layers prevent unsafe outputs:Pydantic Schema Validation
Pydantic Schema Validation
All models use strict typing:Invalid data is rejected before it enters the system.
No Hallucinated Medications
No Hallucinated Medications
Cross-reference all medications against DrugBank vocabulary before output
Minimum Differential Count
Minimum Differential Count
Require ≥2 differential diagnoses to avoid anchoring bias
Safety Flag Enforcement
Safety Flag Enforcement
If Medical Error Panel flags contraindicated interactions, they MUST appear in Plan section
MUC Confidence Thresholds
MUC Confidence Thresholds
Model Under Certainty (MUC) analysis flags low-confidence outputs for human review
Emergency Mode (Fast Path)
Bypass debate for time-critical cases:Key Design Decisions
LanceDB over Pinecone
Serverless, embedded, no infrastructure cost - perfect for hackathon pace
Async Python (asyncio)
All agent calls are async for maximum parallelization
Pydantic Everywhere
Type safety, auto-validation, JSON schema generation for API docs
Fixed Debate Rounds
Deterministic 2-3 rounds prevent infinite loops while allowing refinement
Next Steps
Multi-Agent Debate
Deep dive into debate mechanics, consensus algorithms, and critique formatting
Agent Types
Detailed breakdown of Clinical, Literature, Safety, and Critic agents
Safety System
Medical Error Prevention Panel implementation and guardrails