Skip to main content
The AI Chat feature provides fast conversational clinical Q&A powered by Groq’s Llama 3.3 70B, delivering sub-second responses without the overhead of the full agent pipeline.

Overview

Sub-Second Latency

Typical response time: 300-800ms (vs. 5-12s for GPT-4o)

Multi-Turn Conversations

Maintains full conversation history for follow-up questions

Clinical System Prompt

Evidence-based, structured formatting, safety-first approach

OpenAI Fallback

Tries OpenAI first if configured, falls back to Groq

Key Differences vs. Full Analysis

FeatureAI ChatFull Analysis (/api/analyze)
Latency<1s~100s
AgentsSingle LLM call14+ agent calls (Clinical, Literature, Safety, Critic × 3 rounds)
OutputConversational textStructured SOAP note
Literature Search❌ No✅ PubMed + RAG
Drug Safety❌ No✅ DrugBank + RxNorm + FDA
Debate❌ No✅ Multi-round
Use CaseQuick Q&A, clarificationFull clinical assessment
Future Enhancement: AI Chat will connect to LanceDB RAG for grounded answers with PubMed citations. Currently relies on model’s training knowledge.

API Usage

Endpoint

curl -X POST https://api.clinicalpilot.ai/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What are the diagnostic criteria for STEMI?"}
    ]
  }'

Response Format

{
  "reply": "**STEMI Diagnostic Criteria (ACC/AHA Guidelines)**\n\n1. **ECG Findings** (≥2 contiguous leads):\n   - ≥1mm ST elevation in limb leads (I, II, III, aVL, aVF)\n   - ≥2mm ST elevation in precordial leads (V1-V6)\n   - New left bundle branch block (LBBB)\n\n2. **Clinical Presentation**:\n   - Chest pain lasting >20 minutes\n   - Radiation to arm/jaw/back\n   - Diaphoresis, nausea, dyspnea\n\n3. **Biomarkers** (supportive, NOT required for STEMI diagnosis):\n   - Elevated troponin (develops 2-4h after symptom onset)\n   - CK-MB (less specific)\n\n**Key Point**: STEMI is an **ECG diagnosis** — do NOT wait for troponin to activate cath lab. Door-to-balloon goal: <90 minutes.\n\n*Source: 2013 ACCF/AHA STEMI Guidelines (Circulation 2013;127:e362)*",
  "model": "llama-3.3-70b-versatile",
  "provider": "groq",
  "latency_ms": 620,
  "tokens": 312
}

Multi-Turn Conversations

AI Chat maintains full conversation history for context-aware follow-ups.
conversation = [
    {"role": "user", "content": "What are the causes of elevated troponin?"},
]

# First message
response = requests.post(
    "https://api.clinicalpilot.ai/api/chat",
    json={"messages": conversation}
)
assistant_reply = response.json()["reply"]
conversation.append({"role": "assistant", "content": assistant_reply})

# Follow-up question
conversation.append({
    "role": "user",
    "content": "Which of those causes would present with ST elevation?"
})
response = requests.post(
    "https://api.clinicalpilot.ai/api/chat",
    json={"messages": conversation}
)
print(response.json()["reply"])
The conversation array grows with each exchange. The full history is sent on every request, so the model maintains context.

System Prompt

The AI Chat endpoint uses a dedicated clinical system prompt optimized for decision support:
backend/main.py:368
CHAT_SYSTEM_PROMPT = """You are ClinicalPilot AI, a clinical decision-support assistant for healthcare professionals.

Your role:
- Answer clinical questions with evidence-based, concise responses.
- Help with differential diagnoses, drug interactions, guideline lookups, lab interpretation, and clinical reasoning.
- Always cite relevant guidelines (e.g., ACC/AHA, WHO, UpToDate) when applicable.
- If a question involves patient safety, flag it clearly.
- Use structured formatting (bullet points, numbered lists) for clarity.
- If you are unsure, say so — never fabricate clinical information.

You are NOT a replacement for clinical judgment. Always remind users that your answers are for educational/decision-support purposes only.

IMPORTANT: In future you will have access to a LanceDB vector store with indexed medical literature for RAG-enhanced answers. For now, rely on your training knowledge."""

Design Principles

  • Cites guidelines (ACC/AHA, WHO, ESC, NICE, UpToDate)
  • References clinical trials when applicable
  • Distinguishes between established practice and emerging evidence
  • Uses bullet points for differential diagnoses
  • Numbered lists for step-by-step workflows
  • Tables for drug comparisons
  • Bold text for safety warnings
  • Flags contraindications
  • Highlights drug interactions
  • Notes when immediate action is needed (“Activate STEMI code”)
  • Disclaimers about AI limitations
  • Says “I don’t know” when uncertain
  • Suggests consulting specialists for edge cases
  • Reminds users that answers are for educational purposes

Provider Fallback Logic

The /api/chat endpoint implements graceful fallback:

Code Implementation

backend/main.py:383
@app.post("/api/chat")
async def chat(payload: dict):
    """AI chat — tries OpenAI first, falls back to Groq."""
    messages = payload.get("messages", [])
    if not messages:
        raise HTTPException(400, "No messages provided")

    openai_key = get_effective("openai_api_key")
    groq_key = get_effective("groq_api_key")

    if not openai_key and not groq_key:
        raise HTTPException(
            503,
            "No LLM API key configured. Please set your OpenAI API key in Settings, or add GROQ_API_KEY to .env"
        )

    full_messages = [{"role": "system", "content": CHAT_SYSTEM_PROMPT}] + [
        {"role": m["role"], "content": m["content"]} for m in messages
    ]

    # Try OpenAI first
    if openai_key:
        try:
            from openai import OpenAI as SyncOpenAI

            client = SyncOpenAI(api_key=openai_key)

            t0 = time.time()
            completion = client.chat.completions.create(
                model=settings.openai_model,
                messages=full_messages,
                temperature=0.3,
                max_tokens=2048,
            )
            latency_ms = int((time.time() - t0) * 1000)

            reply = completion.choices[0].message.content
            return {
                "reply": reply,
                "model": settings.openai_model,
                "provider": "openai",
                "latency_ms": latency_ms,
                "tokens": completion.usage.total_tokens,
            }
        except Exception as e:
            logger.warning(f"OpenAI chat failed ({e}), falling back to Groq")
            if not groq_key:
                raise HTTPException(500, f"OpenAI chat failed and no Groq fallback: {str(e)}")

    # Fallback to Groq
    try:
        from groq import Groq

        client = Groq(api_key=groq_key)

        t0 = time.time()
        completion = client.chat.completions.create(
            model=settings.groq_model,
            messages=full_messages,
            temperature=0.3,
            max_tokens=2048,
        )
        latency_ms = int((time.time() - t0) * 1000)

        reply = completion.choices[0].message.content
        return {
            "reply": reply,
            "model": settings.groq_model,
            "provider": "groq",
            "latency_ms": latency_ms,
            "tokens": completion.usage.total_tokens,
        }
    except Exception as e:
        logger.exception("Chat failed on all providers")
        raise HTTPException(500, f"Chat failed: {str(e)}")
Why OpenAI First? If both keys are set, OpenAI GPT-4o typically has higher clinical accuracy than Llama 3.3 70B. Groq is used as a speed fallback when OpenAI is unavailable or rate-limited.

Configuration

Environment Variables

.env
# Groq (required for AI Chat)
GROQ_API_KEY=gsk_...
GROQ_MODEL=llama-3.3-70b-versatile  # Default model

# OpenAI (optional, used if available)
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o

Runtime API Key Setup

You can set API keys at runtime (not persisted to disk):
curl -X POST https://api.clinicalpilot.ai/api/set-api-key \
  -H "Content-Type: application/json" \
  -d '{
    "openai_api_key": "sk-proj-...",
    "groq_api_key": "gsk_..."
  }'

Use Cases

Example: “What’s the Wells score for DVT?”Response:
**Wells Score for DVT**

| Criterion | Points |
|-----------|--------|
| Active cancer | +1 |
| Paralysis/recent plaster cast | +1 |
| Bedridden >3 days or major surgery <12 weeks | +1 |
| Tenderness along deep vein | +1 |
| Entire leg swollen | +1 |
| Calf swelling >3cm vs. other leg | +1 |
| Pitting edema | +1 |
| Collateral superficial veins | +1 |
| Alternative diagnosis more likely | -2 |

**Interpretation**:
- ≥2: DVT likely (D-dimer if negative → US)
- <2: DVT unlikely (D-dimer if negative → stop workup)
Example: “Patient has K+ 5.8. What should I do?”Response:
**Hyperkalemia (K+ 5.8 mEq/L)**

**Severity**: Moderate (5.5-6.5)

**Immediate Actions**:
1. **ECG** — Look for peaked T waves, widened QRS, loss of P waves
2. If ECG changes present:
   - Calcium gluconate 10% 10mL IV over 2-3 min (membrane stabilization)
3. Shift K+ intracellularly:
   - Insulin 10 units IV + D50 25g IV
   - Albuterol 10-20mg nebulized
4. Remove K+ from body:
   - Furosemide 40mg IV (if euvolemic)
   - Patiromer or sodium polystyrene sulfonate (slower)

**Recheck K+ in 2h**

**⚠️ Hold ACE-I, ARB, spironolactone until K+ normalized**
Example: “65yo with new onset atrial fibrillation. What’s on the differential?”Response:
**Causes of New Onset Atrial Fibrillation**

**Cardiac**:
- Hypertensive heart disease
- Coronary artery disease / ACS
- Valvular disease (mitral stenosis, regurgitation)
- Cardiomyopathy (dilated, hypertrophic)
- Pericarditis / myocarditis

**Pulmonary**:
- Pulmonary embolism ⚠️
- COPD exacerbation
- Obstructive sleep apnea

**Metabolic / Endocrine**:
- Hyperthyroidism (check TSH)
- Electrolyte disturbances (↓K+, ↓Mg2+)

**Toxic / Iatrogenic**:
- Alcohol ("holiday heart")
- Stimulants (caffeine, cocaine)
- Medications (theophylline, beta-agonists)

**Workup**:
- ECG, troponin, TSH, BMP, CXR, echo
- D-dimer if PE suspected
Example: “Can I give azithromycin to a patient on warfarin?”Response:
**⚠️ Azithromycin + Warfarin Interaction**

**Risk**: Moderate

**Mechanism**: Azithromycin inhibits CYP3A4 and may potentiate warfarin's anticoagulant effect → ↑ INR → bleeding risk

**Management**:
1. **Can prescribe**, but monitor closely:
   - Check INR 2-3 days after starting azithromycin
   - Recheck INR 2-3 days after stopping
2. Consider alternative antibiotic if possible:
   - Doxycycline (fewer interactions)
   - Amoxicillin (if appropriate for indication)

**Patient Education**:
- Watch for bleeding signs (bruising, gum bleeding, black stools)
- Avoid NSAIDs during azithromycin course

Performance Benchmarks

Latency Comparison

ProviderModelTypical LatencyTokens/sec
GroqLlama 3.3 70B300-800ms~500
OpenAIGPT-4o5-12s~50
OpenAIGPT-4o-mini2-4s~100

Token Usage

Typical chat exchange:
  • System prompt: ~200 tokens
  • User question: 20-100 tokens
  • Assistant response: 200-500 tokens
  • Total per exchange: ~500-800 tokens
Groq offers 25 req/min free tier (enough for demos). For production, consider Groq’s pay-as-you-go pricing.

Future Enhancements

RAG Integration

Connect to LanceDB with indexed PubMed articles for grounded answers with citations

Agent Mode Toggle

Allow users to trigger full agent pipeline from chat (“Run full analysis on this case”)

Image Upload

Support ECG, X-ray, CT uploads for multimodal analysis

Chat History Persistence

Save conversations to database for later review

Best Practices

1

Ask Specific Questions

✅ Good: “What’s the HEART score for chest pain?”❌ Bad: “Tell me about chest pain”
2

Provide Context for Case-Specific Questions

✅ Good: “68yo male with troponin 2.1, BP 90/60. Is this STEMI or NSTEMI?”❌ Bad: “Is this STEMI?”
3

Use Follow-Up Questions

The model has conversation history. Ask clarifying questions:
  1. “What are causes of elevated troponin?”
  2. “Which of those would have ST elevation?”
  3. “What’s the door-to-balloon goal for STEMI?”
4

Cross-Reference Critical Decisions

AI Chat is for decision support, not replacement of clinical judgment. Always verify critical treatment decisions with:
  • Primary literature
  • Local protocols
  • Senior clinician consultation

Next Steps

Full Analysis

Run the multi-agent debate pipeline for complex cases

Emergency Mode

Fast-path triage for time-critical scenarios

Build docs developers (and LLMs) love