Debate Philosophy
Core Insight: A single LLM can miss critical details, anchor on initial diagnoses, or overlook safety issues. By having specialized agents debate and critique each other, we expose blind spots and force deeper reasoning.
Why Debate Works
Specialization
Each agent focuses on its domain (clinical reasoning, literature, safety) without cognitive overload
Adversarial Review
Critic agent actively looks for flaws, contradictions, and missing evidence
Iterative Refinement
Agents revise outputs based on critique, incorporating new evidence and addressing gaps
Debate Architecture
Round-by-Round Breakdown
Round 1: Independent Generation
Goal: Generate initial outputs without bias from other agentsClinical Agent Runs First
Generates differential diagnoses, risk scores, and SOAP draft based on PatientContext:
Literature & Safety Run in Parallel
Literature agent receives Clinical’s differentials to search for evidence:Optimization: OpenAI/Ollama run these in parallel. Groq uses sequential with 2s delays for rate limits.
Critic Reviews All Outputs
Receives Clinical, Literature, and Safety outputs, then identifies:
- EHR contradictions: Does the DDx conflict with patient history?
- Evidence gaps: Are claims unsupported by literature?
- Safety misses: Did Safety agent catch all drug interactions?
- Dissent log: What points of disagreement exist between agents?
Round 2+: Revision Based on Critique
Goal: Address gaps and contradictions identified by CriticHow Critique is Formatted for Agents
How Critique is Formatted for Agents
Agents Revise Outputs
Clinical revises DDx, Literature searches new queries, Safety re-checks interactions:
Critic Reviews Again
Checks if issues were addressed. If yes, sets
consensus_reached = True. If not, logs remaining dissent.Consensus Algorithm
Consensus is qualitative, not quantitative. The Critic agent uses LLM reasoning to determine if outputs are coherent, evidence-backed, and safe.
Consensus Criteria (from Critic System Prompt)
Example Consensus Decision
Debate State Tracking
TheDebateState model captures full history across rounds:
- Show round-by-round progression in a timeline
- Display critic feedback for each round
- Highlight final consensus or human review flag
Performance Optimizations
Parallel Agent Execution
Parallel Agent Execution
Literature and Safety run in parallel when possible:Speedup: ~2× faster than sequential (12s → 6s for Round 1)
Parallel External API Calls
Parallel External API Calls
PubMed searches run 3 queries concurrently:Speedup: ~3× faster than sequential (9s → 3s)
Groq Rate Limit Handling
Groq Rate Limit Handling
When using Groq (free tier), sequential execution with delays:
Med Error Panel Runs in Parallel
Med Error Panel Runs in Parallel
Safety panel runs alongside debate, not after:Speedup: No added latency for safety panel
Debate vs. Emergency Mode
| Feature | Debate Mode | Emergency Mode |
|---|---|---|
| Rounds | 2-3 rounds | 0 (single pass) |
| Agents | Clinical, Literature, Safety, Critic | Clinical, Safety only |
| Consensus | Yes (Critic-driven) | No |
| Latency | ~100 seconds | <5 seconds |
| Use Case | Complex cases, outpatient | Time-critical, ED triage |
| Output | Full SOAP note | Top 3 DDx + red flags |
- ESI Level 1-2 (immediate/emergent)
- Chest pain, stroke symptoms, severe trauma
- Any case where debate latency is unacceptable
Handling No Consensus
If consensus is not reached after 3 rounds:Example Debate Timeline
Case: 68yo Male with Fatigue & Dizziness
Case: 68yo Male with Fatigue & Dizziness
Round 1 (45s):
- Clinical: DDx = CKD, Anemia, Orthostatic Hypotension
- Literature: Found 4 PubMed papers on orthostatic hypotension in elderly
- Safety: Flagged Lisinopril + Potassium hyperkalemia risk
- Critic: ✗ No consensus - missing evidence for CKD claim (normal creatinine)
- Clinical: Revised DDx = Anemia, Orthostatic Hypotension, Medication Side Effects (removed CKD)
- Literature: Added 2 citations on ACE inhibitor side effects
- Safety: Re-confirmed Lisinopril + K interaction
- Critic: ✓ Consensus - all issues addressed, outputs coherent
Configuration
Next Steps
Agent Types
Deep dive into each agent’s implementation and prompts
Safety System
Medical Error Prevention Panel and guardrails
Architecture
Layer-by-layer system overview