Human-in-the-Loop

ClinicalPilot’s Human-in-the-Loop (HITL) workflow allows clinicians to provide feedback on AI-generated SOAP notes and trigger re-analysis with up to 1 iteration of the debate pipeline.

Overview

Iterative Refinement

Doctors edit SOAP notes and add feedback → system re-runs analysis

Max 1 Re-Debate

Prevents infinite loops — only 1 re-analysis per case allowed

Full Context Preservation

Original patient data + doctor feedback merged for re-analysis

Audit Trail

All edits and feedback logged for compliance and learning

Workflow

Initial AI Analysis

Submit case via /api/analyze → receive SOAP note

response = requests.post(
    "https://api.clinicalpilot.ai/api/analyze",
    json={"text": "68yo male with chest pain, troponin 1.2..."}
)
soap = response.json()["soap"]

Doctor Reviews Output

Clinician reads the SOAP note and identifies:

Missing differentials
Incorrect assessments
Overlooked safety concerns
Desired plan modifications

Doctor Provides Feedback

Two input methods:

Edited SOAP
Structured Feedback

Doctor directly edits the SOAP text:

{
  "edited_soap": "Assessment: Top differential is NSTEMI, not unstable angina. Patient has elevated troponin (1.2) and dynamic ECG changes. Risk factors: HTN, DM2, smoking.",
  "original_text": "68yo male with chest pain, troponin 1.2...",
  "feedback": ""
}

Doctor provides commentary without editing SOAP:

{
  "edited_soap": "",
  "original_text": "68yo male with chest pain, troponin 1.2...",
  "feedback": "Consider NSTEMI over unstable angina given troponin elevation. Also missing aortic dissection in differential — patient has back pain. Recommend adding CT angio to rule out dissection before anticoagulation."
}

Submit Feedback

POST to /api/human-feedback:

curl -X POST https://api.clinicalpilot.ai/api/human-feedback \
  -H "Content-Type: application/json" \
  -d '{
    "edited_soap": "...",
    "original_text": "68yo male with chest pain, troponin 1.2...",
    "feedback": "Consider NSTEMI over unstable angina..."
  }'

System Re-Analyzes

ClinicalPilot merges original case + feedback:

backend/main.py:278

# Re-run with feedback as additional context
enhanced_text = f"{original_text}\n\n---\nDoctor Feedback: {feedback}\nEdited SOAP: {edited_soap}"
request = AnalysisRequest(text=enhanced_text)
soap, debate_state = await full_pipeline(request)

The full debate pipeline runs again:

Clinical Agent sees doctor’s corrections
Literature Agent searches based on feedback keywords
Safety Agent re-checks with new considerations
Critic Agent reconciles AI + human input

Return Updated SOAP

Response includes:

New SOAP note (incorporating feedback)
Updated debate summary
Note indicating re-analysis completed

{
  "soap": { ... },
  "debate": { ... },
  "note": "Re-analysis with human feedback completed"
}

API Reference

Endpoint

/api/human-feedback

POST

Submit doctor feedback and trigger re-analysis

Request Body

{
  "edited_soap": "string",      // Optional: Doctor's edited SOAP text
  "original_text": "string",    // Required: Original patient data submitted to /api/analyze
  "feedback": "string"          // Optional: Doctor's commentary/corrections
}

edited_soap

string

The clinician’s manually edited SOAP note. If provided, the re-analysis will prioritize this version.

original_text

string

required

The original clinical input text from the initial /api/analyze request. This ensures the system has full patient context.

feedback

string

Structured feedback from the clinician (e.g., “Missing differential: aortic dissection. Add CT angio to plan.”). This is appended to the prompt for re-analysis.

Response

{
  "soap": {
    "subjective": "...",
    "objective": "...",
    "assessment": "...",
    "plan": "...",
    "differentials": [...],
    "safety_flags": [...]
  },
  "debate": {
    "round_number": 3,
    "final_consensus": true,
    "clinical_outputs": [...],
    "critic_outputs": [...]
  },
  "note": "Re-analysis with human feedback completed"
}

Code Implementation

backend/main.py:262

@app.post("/api/human-feedback", response_model=dict)
async def human_feedback(payload: dict):
    """
    Human-in-the-loop: doctor edits SOAP → triggers re-debate (max 1 iteration).
    """
    edited_soap = payload.get("edited_soap", "")
    original_text = payload.get("original_text", "")
    feedback = payload.get("feedback", "")

    if not edited_soap and not feedback:
        raise HTTPException(400, "Must provide edited_soap or feedback")

    try:
        from backend.agents.orchestrator import full_pipeline
        from backend.models.patient import AnalysisRequest

        # Re-run with feedback as additional context
        enhanced_text = f"{original_text}\n\n---\nDoctor Feedback: {feedback}\nEdited SOAP: {edited_soap}"
        request = AnalysisRequest(text=enhanced_text)
        soap, debate_state = await full_pipeline(request)

        return {
            "soap": soap.model_dump(),
            "debate": debate_state.model_dump(),
            "note": "Re-analysis with human feedback completed",
        }
    except Exception as e:
        logger.exception("Human feedback re-analysis failed")
        raise HTTPException(500, str(e))

Use Cases

Missed Differential

Scenario: AI outputs differential for chest pain but misses aortic dissection.Doctor Action:

{
  "feedback": "Patient has back pain radiating to shoulders — consider aortic dissection. Add CT angio chest to rule out before starting anticoagulation."
}

Re-Analysis Output:

Aortic dissection added to differentials
Plan updated: “CT angio chest to r/o dissection before heparin”
Safety flag: “Hold anticoagulation until dissection ruled out”

Incorrect Risk Stratification

Scenario: AI classifies NSTEMI as low-risk (HEART score 3), but patient has 3-vessel CAD.Doctor Action:

{
  "feedback": "Patient has known 3-vessel CAD from prior cath (2024-01-15). This is high-risk ACS, not low-risk. Recommend cath, not stress test."
}

Re-Analysis Output:

Risk stratification corrected to high-risk
Plan changed from “Outpatient stress test” to “Admit, cardiology consult, likely cath within 24h”

Drug Safety Concern

Scenario: AI plan includes metformin, but patient has acute kidney injury (Cr 3.2).Doctor Action:

{
  "feedback": "Patient has AKI (Cr 3.2, baseline 1.1). Metformin is contraindicated. Hold metformin, start insulin sliding scale instead."
}

Re-Analysis Output:

Safety Agent flags metformin contraindication
Plan updated: “Hold metformin. Start insulin SSI. Recheck Cr in 24h.”

Guideline Update

Scenario: AI recommends older treatment protocol (e.g., 2019 sepsis guidelines), but institution uses updated 2023 protocol.Doctor Action:

{
  "feedback": "Use 2023 Surviving Sepsis Campaign guidelines: 1-hour bundle, lactate-guided resuscitation, early vasopressors if MAP <65 after 30mL/kg bolus."
}

Re-Analysis Output:

Plan updated to reflect 2023 guidelines
Citations updated to reference 2023 Surviving Sepsis Campaign

Design Rationale

Why Max 1 Re-Debate?

The system limits re-analysis to 1 iteration to:

Prevent Infinite Loops: Without a cap, doctors could repeatedly re-submit feedback, causing exponential LLM calls.
Encourage Finalization: After 1 re-analysis, the doctor should finalize the SOAP manually if still unsatisfied.
Cost Control: Each full pipeline run costs ~ $0.50-$ 1.00 in LLM API calls. Unlimited iterations would be prohibitively expensive.

Future Enhancement: Allow configurable max_iterations per user role (e.g., attending = 2, resident = 1).

How Feedback Is Merged

The system appends feedback to the original text:

enhanced_text = f"{original_text}\n\n---\nDoctor Feedback: {feedback}\nEdited SOAP: {edited_soap}"

This ensures:

Full Context: Agents see original patient data + doctor’s corrections
Clear Delineation: Separator (---) marks where human input begins
Prompt Engineering: Agents are trained to weigh human feedback heavily in debate rounds

Frontend Integration

The ClinicalPilot frontend provides a SOAP Editor for HITL:

frontend/index.html (excerpt)

function submitFeedback() {
  const editedSOAP = document.getElementById('soap-editor').value;
  const feedback = document.getElementById('feedback-textarea').value;
  const originalText = sessionStorage.getItem('original_case_text');

  fetch('/api/human-feedback', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      edited_soap: editedSOAP,
      original_text: originalText,
      feedback: feedback
    })
  })
  .then(res => res.json())
  .then(data => {
    // Display updated SOAP
    renderSOAP(data.soap);
    showNotification('Re-analysis complete — review updated SOAP note');
  });
}

UI Flow

Initial SOAP Display

After analysis completes, SOAP is shown with an “Edit SOAP” button.

Doctor Clicks Edit

SOAP text becomes editable in a <textarea>. A separate “Feedback” field appears for commentary.

Submit Feedback

Doctor clicks “Re-Analyze with Feedback” → triggers /api/human-feedback.

Loading State

UI shows spinner: “Re-running analysis with your feedback… (~100s)”

Updated SOAP Displayed

New SOAP replaces old version. Banner indicates: “Updated based on your feedback.”

Audit Trail

For compliance (HIPAA, medico-legal), all HITL interactions should be logged:

# Pseudocode for future audit logging
import logging

audit_logger = logging.getLogger("audit")

@app.post("/api/human-feedback")
async def human_feedback(payload: dict, user: User = Depends(get_current_user)):
    audit_logger.info(
        f"HITL Re-Analysis | User: {user.email} | "
        f"Case ID: {case_id} | Feedback: {payload.get('feedback')[:100]}..."
    )
    # ... re-run pipeline ...
    audit_logger.info(
        f"HITL Complete | Case ID: {case_id} | Updated SOAP saved"
    )

Production Requirement: Implement audit logging before deploying to clinical environments. Logs should capture:

User ID
Timestamp
Original SOAP
Edited SOAP
Feedback text
Re-analysis output

Performance Considerations

Latency

Step	Duration	Notes
Submit feedback	~50ms	API call
Re-run full pipeline	~100s	Same as initial analysis
Update frontend	~200ms	Render new SOAP
Total	~100s	Same cost as initial run

Cost: Each HITL re-analysis costs the same as the initial analysis (~

0.50-

1.00 in LLM API fees).

Optimization Ideas

Delta Re-Analysis

Only re-run agents affected by feedback (e.g., if feedback is about differentials, skip Safety Agent)

Caching

Cache Literature Agent PubMed results if feedback doesn’t change search queries

Async Notification

Return immediately, send email/SMS when re-analysis completes (for long cases)

Partial SOAP Update

Allow doctors to flag specific sections for re-generation (“Re-generate Plan only”)

Best Practices

Be Specific in Feedback

✅ Good: “Add CT angio to rule out aortic dissection — patient has back pain radiating to shoulders”❌ Bad: “Plan is incomplete”

Reference Clinical Data

Cite labs, vitals, or exam findings:✅ “Troponin is 1.2 (not 0.12) — this is NSTEMI, not unstable angina”

Suggest Evidence-Based Changes

Reference guidelines when correcting:✅ “Per 2023 AHA STEMI guidelines, door-to-balloon should be <90 min, not <120 min”

Use Edited SOAP for Major Rewrites

If >50% of SOAP needs changes, edit the SOAP directly rather than writing long feedback.

Limitations

Known Constraints:

Max 1 Iteration: After 1 re-analysis, further changes require manual SOAP editing (no more AI re-runs).
No Session Persistence: If user refreshes the page, original case text must be re-entered.
Feedback Format: Currently free text. Future: structured feedback fields (“Add differential”, “Correct plan”).
No Multi-User Collaboration: If 2 doctors edit the same case, last submission wins (no merge conflict resolution).

Future Enhancements

Structured Feedback Forms

Guided UI: “Add differential”, “Flag safety issue”, “Correct lab value”

Version History

Track all SOAP versions (v1 = AI, v2 = after feedback, v3 = manual edits)

Multi-User Review

Allow attending + resident to both provide feedback → system reconciles

Reinforcement Learning

Use doctor corrections to fine-tune Clinical Agent (RLHF)

Next Steps

Full Analysis API

Learn how the multi-agent debate pipeline works

Emergency Mode

Fast-path triage for time-critical cases

Get Started

Core Concepts

Guides

Overview

Iterative Refinement

Max 1 Re-Debate

Full Context Preservation

Audit Trail

Workflow

API Reference

Endpoint

Request Body

Response

Code Implementation

Use Cases

Design Rationale

Why Max 1 Re-Debate?

How Feedback Is Merged

Frontend Integration

UI Flow

Audit Trail

Performance Considerations

Latency

Optimization Ideas

Delta Re-Analysis

Caching

Async Notification

Partial SOAP Update

Best Practices

Limitations

Future Enhancements

Structured Feedback Forms

Version History

Multi-User Review

Reinforcement Learning

Next Steps

Full Analysis API

Emergency Mode

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

Iterative Refinement

Max 1 Re-Debate

Full Context Preservation

Audit Trail

​Workflow

​API Reference

​Endpoint

​Request Body

​Response

​Code Implementation

​Use Cases

​Design Rationale

​Why Max 1 Re-Debate?

​How Feedback Is Merged

​Frontend Integration

​UI Flow

​Audit Trail

​Performance Considerations

​Latency

​Optimization Ideas

Delta Re-Analysis

Caching

Async Notification

Partial SOAP Update

​Best Practices

​Limitations

​Future Enhancements

Structured Feedback Forms

Version History

Multi-User Review

Reinforcement Learning

​Next Steps

Full Analysis API

Emergency Mode

Build docs developers (and LLMs) love

Overview

Workflow

API Reference

Endpoint

Request Body

Response

Code Implementation

Use Cases

Design Rationale

Why Max 1 Re-Debate?

How Feedback Is Merged

Frontend Integration

UI Flow

Audit Trail

Performance Considerations

Latency

Optimization Ideas

Best Practices

Limitations

Future Enhancements

Next Steps