AI Chat

The AI Chat feature provides fast conversational clinical Q&A powered by Groq’s Llama 3.3 70B, delivering sub-second responses without the overhead of the full agent pipeline.

Overview

Sub-Second Latency

Typical response time: 300-800ms (vs. 5-12s for GPT-4o)

Multi-Turn Conversations

Maintains full conversation history for follow-up questions

Clinical System Prompt

Evidence-based, structured formatting, safety-first approach

OpenAI Fallback

Tries OpenAI first if configured, falls back to Groq

Key Differences vs. Full Analysis

Feature	AI Chat	Full Analysis (`/api/analyze`)
Latency	<1s	~100s
Agents	Single LLM call	14+ agent calls (Clinical, Literature, Safety, Critic × 3 rounds)
Output	Conversational text	Structured SOAP note
Literature Search	❌ No	✅ PubMed + RAG
Drug Safety	❌ No	✅ DrugBank + RxNorm + FDA
Debate	❌ No	✅ Multi-round
Use Case	Quick Q&A, clarification	Full clinical assessment

Future Enhancement: AI Chat will connect to LanceDB RAG for grounded answers with PubMed citations. Currently relies on model’s training knowledge.

API Usage

Endpoint

curl -X POST https://api.clinicalpilot.ai/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What are the diagnostic criteria for STEMI?"}
    ]
  }'

Response Format

{
  "reply": "**STEMI Diagnostic Criteria (ACC/AHA Guidelines)**\n\n1. **ECG Findings** (≥2 contiguous leads):\n   - ≥1mm ST elevation in limb leads (I, II, III, aVL, aVF)\n   - ≥2mm ST elevation in precordial leads (V1-V6)\n   - New left bundle branch block (LBBB)\n\n2. **Clinical Presentation**:\n   - Chest pain lasting >20 minutes\n   - Radiation to arm/jaw/back\n   - Diaphoresis, nausea, dyspnea\n\n3. **Biomarkers** (supportive, NOT required for STEMI diagnosis):\n   - Elevated troponin (develops 2-4h after symptom onset)\n   - CK-MB (less specific)\n\n**Key Point**: STEMI is an **ECG diagnosis** — do NOT wait for troponin to activate cath lab. Door-to-balloon goal: <90 minutes.\n\n*Source: 2013 ACCF/AHA STEMI Guidelines (Circulation 2013;127:e362)*",
  "model": "llama-3.3-70b-versatile",
  "provider": "groq",
  "latency_ms": 620,
  "tokens": 312
}

Multi-Turn Conversations

AI Chat maintains full conversation history for context-aware follow-ups.

Python SDK
JavaScript SDK

conversation = [
    {"role": "user", "content": "What are the causes of elevated troponin?"},
]

# First message
response = requests.post(
    "https://api.clinicalpilot.ai/api/chat",
    json={"messages": conversation}
)
assistant_reply = response.json()["reply"]
conversation.append({"role": "assistant", "content": assistant_reply})

# Follow-up question
conversation.append({
    "role": "user",
    "content": "Which of those causes would present with ST elevation?"
})
response = requests.post(
    "https://api.clinicalpilot.ai/api/chat",
    json={"messages": conversation}
)
print(response.json()["reply"])

const conversation = [
  { role: "user", content: "What are the causes of elevated troponin?" }
];

// First message
let response = await fetch("https://api.clinicalpilot.ai/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ messages: conversation })
});
let data = await response.json();
conversation.push({ role: "assistant", content: data.reply });

// Follow-up question
conversation.push({
  role: "user",
  content: "Which of those causes would present with ST elevation?"
});
response = await fetch("https://api.clinicalpilot.ai/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ messages: conversation })
});
data = await response.json();
console.log(data.reply);

The conversation array grows with each exchange. The full history is sent on every request, so the model maintains context.

System Prompt

The AI Chat endpoint uses a dedicated clinical system prompt optimized for decision support:

backend/main.py:368

CHAT_SYSTEM_PROMPT = """You are ClinicalPilot AI, a clinical decision-support assistant for healthcare professionals.

Your role:
- Answer clinical questions with evidence-based, concise responses.
- Help with differential diagnoses, drug interactions, guideline lookups, lab interpretation, and clinical reasoning.
- Always cite relevant guidelines (e.g., ACC/AHA, WHO, UpToDate) when applicable.
- If a question involves patient safety, flag it clearly.
- Use structured formatting (bullet points, numbered lists) for clarity.
- If you are unsure, say so — never fabricate clinical information.

You are NOT a replacement for clinical judgment. Always remind users that your answers are for educational/decision-support purposes only.

IMPORTANT: In future you will have access to a LanceDB vector store with indexed medical literature for RAG-enhanced answers. For now, rely on your training knowledge."""

Design Principles

Evidence-Based

Cites guidelines (ACC/AHA, WHO, ESC, NICE, UpToDate)
References clinical trials when applicable
Distinguishes between established practice and emerging evidence

Structured Formatting

Uses bullet points for differential diagnoses
Numbered lists for step-by-step workflows
Tables for drug comparisons
Bold text for safety warnings

Safety-First

Flags contraindications
Highlights drug interactions
Notes when immediate action is needed (“Activate STEMI code”)
Disclaimers about AI limitations

Epistemic Humility

Says “I don’t know” when uncertain
Suggests consulting specialists for edge cases
Reminds users that answers are for educational purposes

Provider Fallback Logic

The /api/chat endpoint implements graceful fallback:

Code Implementation

backend/main.py:383

@app.post("/api/chat")
async def chat(payload: dict):
    """AI chat — tries OpenAI first, falls back to Groq."""
    messages = payload.get("messages", [])
    if not messages:
        raise HTTPException(400, "No messages provided")

    openai_key = get_effective("openai_api_key")
    groq_key = get_effective("groq_api_key")

    if not openai_key and not groq_key:
        raise HTTPException(
            503,
            "No LLM API key configured. Please set your OpenAI API key in Settings, or add GROQ_API_KEY to .env"
        )

    full_messages = [{"role": "system", "content": CHAT_SYSTEM_PROMPT}] + [
        {"role": m["role"], "content": m["content"]} for m in messages
    ]

    # Try OpenAI first
    if openai_key:
        try:
            from openai import OpenAI as SyncOpenAI

            client = SyncOpenAI(api_key=openai_key)

            t0 = time.time()
            completion = client.chat.completions.create(
                model=settings.openai_model,
                messages=full_messages,
                temperature=0.3,
                max_tokens=2048,
            )
            latency_ms = int((time.time() - t0) * 1000)

            reply = completion.choices[0].message.content
            return {
                "reply": reply,
                "model": settings.openai_model,
                "provider": "openai",
                "latency_ms": latency_ms,
                "tokens": completion.usage.total_tokens,
            }
        except Exception as e:
            logger.warning(f"OpenAI chat failed ({e}), falling back to Groq")
            if not groq_key:
                raise HTTPException(500, f"OpenAI chat failed and no Groq fallback: {str(e)}")

    # Fallback to Groq
    try:
        from groq import Groq

        client = Groq(api_key=groq_key)

        t0 = time.time()
        completion = client.chat.completions.create(
            model=settings.groq_model,
            messages=full_messages,
            temperature=0.3,
            max_tokens=2048,
        )
        latency_ms = int((time.time() - t0) * 1000)

        reply = completion.choices[0].message.content
        return {
            "reply": reply,
            "model": settings.groq_model,
            "provider": "groq",
            "latency_ms": latency_ms,
            "tokens": completion.usage.total_tokens,
        }
    except Exception as e:
        logger.exception("Chat failed on all providers")
        raise HTTPException(500, f"Chat failed: {str(e)}")

Why OpenAI First? If both keys are set, OpenAI GPT-4o typically has higher clinical accuracy than Llama 3.3 70B. Groq is used as a speed fallback when OpenAI is unavailable or rate-limited.

Configuration

Environment Variables

.env

# Groq (required for AI Chat)
GROQ_API_KEY=gsk_...
GROQ_MODEL=llama-3.3-70b-versatile  # Default model

# OpenAI (optional, used if available)
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o

Runtime API Key Setup

You can set API keys at runtime (not persisted to disk):

curl -X POST https://api.clinicalpilot.ai/api/set-api-key \
  -H "Content-Type: application/json" \
  -d '{
    "openai_api_key": "sk-proj-...",
    "groq_api_key": "gsk_..."
  }'

Use Cases

Quick Guideline Lookup

Example: “What’s the Wells score for DVT?”Response:

**Wells Score for DVT**

| Criterion | Points |
|-----------|--------|
| Active cancer | +1 |
| Paralysis/recent plaster cast | +1 |
| Bedridden >3 days or major surgery <12 weeks | +1 |
| Tenderness along deep vein | +1 |
| Entire leg swollen | +1 |
| Calf swelling >3cm vs. other leg | +1 |
| Pitting edema | +1 |
| Collateral superficial veins | +1 |
| Alternative diagnosis more likely | -2 |

**Interpretation**:
- ≥2: DVT likely (D-dimer if negative → US)
- <2: DVT unlikely (D-dimer if negative → stop workup)

Lab Interpretation

Example: “Patient has K+ 5.8. What should I do?”Response:

**Hyperkalemia (K+ 5.8 mEq/L)**

**Severity**: Moderate (5.5-6.5)

**Immediate Actions**:
1. **ECG** — Look for peaked T waves, widened QRS, loss of P waves
2. If ECG changes present:
   - Calcium gluconate 10% 10mL IV over 2-3 min (membrane stabilization)
3. Shift K+ intracellularly:
   - Insulin 10 units IV + D50 25g IV
   - Albuterol 10-20mg nebulized
4. Remove K+ from body:
   - Furosemide 40mg IV (if euvolemic)
   - Patiromer or sodium polystyrene sulfonate (slower)

**Recheck K+ in 2h**

**⚠️ Hold ACE-I, ARB, spironolactone until K+ normalized**

Differential Diagnosis Brainstorming

Example: “65yo with new onset atrial fibrillation. What’s on the differential?”Response:

**Causes of New Onset Atrial Fibrillation**

**Cardiac**:
- Hypertensive heart disease
- Coronary artery disease / ACS
- Valvular disease (mitral stenosis, regurgitation)
- Cardiomyopathy (dilated, hypertrophic)
- Pericarditis / myocarditis

**Pulmonary**:
- Pulmonary embolism ⚠️
- COPD exacerbation
- Obstructive sleep apnea

**Metabolic / Endocrine**:
- Hyperthyroidism (check TSH)
- Electrolyte disturbances (↓K+, ↓Mg2+)

**Toxic / Iatrogenic**:
- Alcohol ("holiday heart")
- Stimulants (caffeine, cocaine)
- Medications (theophylline, beta-agonists)

**Workup**:
- ECG, troponin, TSH, BMP, CXR, echo
- D-dimer if PE suspected

Drug Interaction Check

Example: “Can I give azithromycin to a patient on warfarin?”Response:

**⚠️ Azithromycin + Warfarin Interaction**

**Risk**: Moderate

**Mechanism**: Azithromycin inhibits CYP3A4 and may potentiate warfarin's anticoagulant effect → ↑ INR → bleeding risk

**Management**:
1. **Can prescribe**, but monitor closely:
   - Check INR 2-3 days after starting azithromycin
   - Recheck INR 2-3 days after stopping
2. Consider alternative antibiotic if possible:
   - Doxycycline (fewer interactions)
   - Amoxicillin (if appropriate for indication)

**Patient Education**:
- Watch for bleeding signs (bruising, gum bleeding, black stools)
- Avoid NSAIDs during azithromycin course

Performance Benchmarks

Latency Comparison

Provider	Model	Typical Latency	Tokens/sec
Groq	Llama 3.3 70B	300-800ms	~500
OpenAI	GPT-4o	5-12s	~50
OpenAI	GPT-4o-mini	2-4s	~100

Token Usage

Typical chat exchange:

System prompt: ~200 tokens
User question: 20-100 tokens
Assistant response: 200-500 tokens
Total per exchange: ~500-800 tokens

Groq offers 25 req/min free tier (enough for demos). For production, consider Groq’s pay-as-you-go pricing.

Future Enhancements

RAG Integration

Connect to LanceDB with indexed PubMed articles for grounded answers with citations

Agent Mode Toggle

Allow users to trigger full agent pipeline from chat (“Run full analysis on this case”)

Image Upload

Support ECG, X-ray, CT uploads for multimodal analysis

Chat History Persistence

Save conversations to database for later review

Best Practices

Ask Specific Questions

✅ Good: “What’s the HEART score for chest pain?”❌ Bad: “Tell me about chest pain”

Provide Context for Case-Specific Questions

✅ Good: “68yo male with troponin 2.1, BP 90/60. Is this STEMI or NSTEMI?”❌ Bad: “Is this STEMI?”

Use Follow-Up Questions

The model has conversation history. Ask clarifying questions:

“What are causes of elevated troponin?”
“Which of those would have ST elevation?”
“What’s the door-to-balloon goal for STEMI?”

Cross-Reference Critical Decisions

AI Chat is for decision support, not replacement of clinical judgment. Always verify critical treatment decisions with:

Primary literature
Local protocols
Senior clinician consultation

Next Steps

Full Analysis

Run the multi-agent debate pipeline for complex cases

Emergency Mode

Fast-path triage for time-critical scenarios

Get Started

Core Concepts

Guides

Overview

Sub-Second Latency

Multi-Turn Conversations

Clinical System Prompt

OpenAI Fallback

Key Differences vs. Full Analysis

API Usage

Endpoint

Response Format

Multi-Turn Conversations

System Prompt

Design Principles

Provider Fallback Logic

Code Implementation

Configuration

Environment Variables

Runtime API Key Setup

Use Cases

Performance Benchmarks

Latency Comparison

Token Usage

Future Enhancements

RAG Integration

Agent Mode Toggle

Image Upload

Chat History Persistence

Best Practices

Next Steps

Full Analysis

Emergency Mode

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

Sub-Second Latency

Multi-Turn Conversations

Clinical System Prompt

OpenAI Fallback

​Key Differences vs. Full Analysis

​API Usage

​Endpoint

​Response Format

​Multi-Turn Conversations

​System Prompt

​Design Principles

​Provider Fallback Logic

​Code Implementation

​Configuration

​Environment Variables

​Runtime API Key Setup

​Use Cases

​Performance Benchmarks

​Latency Comparison

​Token Usage

​Future Enhancements

RAG Integration

Agent Mode Toggle

Image Upload

Chat History Persistence

​Best Practices

​Next Steps

Full Analysis

Emergency Mode

Build docs developers (and LLMs) love

Overview

Key Differences vs. Full Analysis

API Usage

Endpoint

Response Format

Multi-Turn Conversations

System Prompt

Design Principles

Provider Fallback Logic

Code Implementation

Configuration

Environment Variables

Runtime API Key Setup

Use Cases

Performance Benchmarks

Latency Comparison

Token Usage

Future Enhancements

Best Practices

Next Steps