AI Best Practices

Comprehensive guide to maximizing AI performance, minimizing costs, and maintaining clinical safety in Paw & Care’s AI-powered features.

Clinical Safety

Human-in-the-Loop Workflow

Critical Rule: All AI-generated content requires veterinarian review before finalization

AI Generates Draft

AI creates SOAP notes, clinical insights, or triage recommendationsStatus: Draft (not visible to other staff, not part of legal record)

Veterinarian Reviews

DVM reads AI output, makes edits, adds missing informationRequired Actions:

Read all four SOAP sections
Verify vitals and measurements
Check medication names and dosages
Confirm diagnosis accuracy

Veterinarian Approves

DVM clicks “Finalize” to approve recordStatus: Finalized (immutable, part of legal medical record)

Audit Trail Created

System logs: user ID, timestamp, original AI output, final edited version

Review Checklists

Dictation Best Practices

Recording Technique

For Accurate Transcription:Environment:

✅ Record in quiet exam room
✅ Close door to muffle hallway noise
✅ Turn off loud equipment (fans, monitors)
❌ Avoid recording during barking/meowing

Microphone:

✅ Hold phone 6-12 inches from mouth
✅ Use wired headset for noisy environments
✅ Check mic isn’t covered by hand/case
❌ Don’t speak into bottom of phone (speaker, not mic)

Speaking:

✅ Normal conversational pace
✅ Clear enunciation of medical terms
✅ Pause briefly between sentences
❌ Don’t rush through vitals

Content Structure

Recommended Dictation Flow:

1. Patient Introduction (5-10 seconds)
   "This is Max, a 5-year-old male neutered beagle."

2. Subjective (30-60 seconds)
   "Owner reports he's been vomiting for 2 days, about 4 episodes.
   No diarrhea. Appetite decreased. Still drinking water.
   No known toxin exposure."

3. Objective (60-90 seconds)
   "Physical exam: temperature 101.2, heart rate 92, respiratory rate 24.
   Body condition score 6 out of 9, mild overweight.
   Abdomen: soft, non-painful on palpation.
   No foreign body felt. Hydration: pink mucous membranes,
   capillary refill time under 2 seconds."

4. Assessment (20-40 seconds)
   "Assessment: acute gastroenteritis, likely dietary indiscretion.
   Differentials include pancreatitis, foreign body, parasites."

5. Plan (40-60 seconds)
   "Plan: fecal test, anti-emetic injection, send home with
   bland diet instructions and metronidazole 250 milligrams
   twice daily for 5 days. Recheck if vomiting continues or
   worsens. Emergency clinic if sees blood or becomes lethargic."

Total dictation time: 3-5 minutes for comprehensive SOAP note

Medical Terminology Tips

Spell Out First Use
Use Full Medical Terms
Clarify Ambiguous Sounds
Correct Mistakes Immediately

Uncommon Terms: Spell first occurrence

"Diagnosed with Bordetella, B-O-R-D-E-T-E-L-L-A, bronchiseptica."
"Brachycephalic, B-R-A-C-H-Y-C-E-P-H-A-L-I-C, airway syndrome."

After Spelling: Use normally

"Bordetella vaccination recommended annually."

Preferred: Latin/medical terminology

✅ "Otitis externa"
✅ "Canine parvovirus"
✅ "Feline upper respiratory infection"

Avoid: Layman’s terms (less precise)

❌ "Ear infection"
❌ "Parvo"
❌ "Cat cold"

Homophones: Distinguish similar-sounding words

"Right ear" → "R-I-G-H-T ear, not left"
"4 milliliters" → "four M-L, not milligrams"
"Amoxicillin" → "A-M-O-X-I-C-I-L-L-I-N, not amoxycillin"

Units: Always spell ambiguous units

"10 milligrams per kilogram" (not "10 mg/kg" spoken quickly)

If You Misspeak:

"Temperature 102... correction, 101.2 degrees Fahrenheit."
"Prescribed amoxicillin... no, sorry, Clavamox."

AI Will Use Latest Version: Corrections override earlier statements

Cost Optimization

Token Management

Use Browser SpeechRecognition
Adjust Detail Level
Batch Processing
Cache Common Queries

Free Live Transcription:

// Enable browser speech API during recording
const recognition = new SpeechRecognition();
recognition.start();

// After recording, check if live transcript is good
if (liveTranscript.length > 100 && liveAccuracyEstimate > 0.8) {
  // Use free browser transcript, skip Whisper API
  return liveTranscript;
} else {
  // Send to Whisper for high accuracy
  return await whisperAPI(audio);
}

Cost Savings: ~40% reduction in Whisper API calls

Concise vs Detailed Templates:

// For routine wellness exams (simple case)
detailLevel: 'concise'  // Shorter prompt, fewer tokens

// For complex medical cases
detailLevel: 'detailed'  // Comprehensive prompt, more tokens

Token Difference:

Concise: ~300 prompt tokens
Detailed: ~600 prompt tokens

Cost Savings: 50% reduction for simple cases

Multiple Insights in One Call:

// ❌ Bad: 3 separate API calls
const diagnosis = await getDiagnosis(soap);
const risks = await getRisks(soap);
const suggestions = await getSuggestions(soap);

// ✅ Good: 1 API call returning all 3
const { diagnoses, risks, suggestions } = await getClinicalInsights(soap);

Cost Savings: 66% reduction (1 call vs 3)

Deduplicate Identical Requests:

const cacheKey = hashContent(transcription + templateId);
const cached = await redis.get(cacheKey);

if (cached) {
  return JSON.parse(cached);  // Free!
}

const result = await generateSOAP(transcription);
await redis.set(cacheKey, JSON.stringify(result), 'EX', 3600);  // 1 hour
return result;

Use Cases: Rare (each dictation unique), but useful for demos/testing

Usage Monitoring

Set Monthly Budget:

const monthlyBudget = {
  whisper: 100,  // $100/month for transcription
  gpt4: 50,      // $50/month for SOAP generation
  total: 150,
};

if (currentMonthUsage.whisper > monthlyBudget.whisper * 0.8) {
  sendAlert('Whisper API usage at 80% of budget');
  // Consider switching to browser SpeechRecognition only
}

Track Per-User Costs:

SELECT 
  user_id,
  user_name,
  SUM(cost_usd) as total_cost,
  COUNT(*) as api_calls,
  AVG(cost_usd) as avg_cost_per_call
FROM api_usage
WHERE timestamp >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY user_id, user_name
ORDER BY total_cost DESC;

Accuracy Optimization

Prompt Engineering

Be Specific
Provide Examples
Set Constraints
Request Structured Output

❌ Vague Prompt:

"Generate a SOAP note from this transcription."

✅ Specific Prompt:

You are an expert veterinary medical scribe.
Generate a structured SOAP note (Subjective, Objective, Assessment, Plan)
from the following dictation.

Patient: Max (Beagle, 5 years old)
Template: Standard SOAP
Detail Level: Concise

Return valid JSON with keys: subjective, objective, assessment, plan.
If information is missing for a section, write "No information provided."

Few-Shot Learning:

Example Input:
"Bella is a 3 year old cat. Owner says she's been vomiting."

Example Output:
{
  "subjective": "3-year-old feline presented with vomiting. Duration and frequency not specified.",
  "objective": "No physical exam information provided.",
  "assessment": "Acute vomiting, etiology unknown.",
  "plan": "Pending physical examination and further history."
}

Now process the actual dictation:
[User's transcription]

Prevent Hallucinations:

CRITICAL RULES:
- Do NOT invent vitals if not mentioned (write "Not recorded")
- Do NOT assume test results not stated in dictation
- Do NOT add diagnostic tests not explicitly mentioned
- If diagnosis unclear, write "Pending further diagnostics"
- Only include information directly stated in the transcription

Enforce JSON Format:

Return ONLY valid JSON. No markdown code fences, no explanations.

Required format:
{
  "subjective": "string",
  "objective": "string",
  "assessment": "string",
  "plan": "string"
}

Model Selection

gpt-4o-mini (Recommended)
gpt-4-turbo
gpt-3.5-turbo

Use For:

Routine SOAP notes
Clinical insights
Billing extraction

Advantages:

70% cheaper than GPT-4
Faster (8-12s vs 15-20s)
Good structured output

Sufficient For: 95% of veterinary documentation

Temperature Settings

// Medical documentation (factual, deterministic)
temperature: 0.3

// Clinical insights (some creativity acceptable)
temperature: 0.4

// Client communication (more natural variation)
temperature: 0.7

Lower temperature = more consistent, factual, but potentially repetitiveHigher temperature = more creative, varied, but potentially inconsistent

Quality Assurance

Regular Audits

Weekly Spot Checks

Sample: 10 random AI-generated SOAP notesCheck For:

Hallucinated vitals (numbers not in dictation)
Incorrect patient names
Inappropriate diagnoses
Missing information from dictation

Action: If >10% error rate, adjust prompts or retrain

Monthly Accuracy Review

Metrics to Track:

Transcription WER (Word Error Rate)
SOAP note veterinarian acceptance rate
Clinical insight relevance score
Emergency detection false positive rate

Benchmarks:

WER: < 5% (excellent)
Acceptance: > 85% (good)
Insight relevance: > 70% (good)
Emergency false positives: < 3% (acceptable)

Quarterly User Feedback

Survey Veterinarians:

“How often do you use AI dictation?” (adoption rate)
“How much editing do AI notes require?” (1-5 scale)
“Have you caught any dangerous AI errors?” (safety)
“What features would improve AI accuracy?” (feedback)

Error Reporting

Implement Feedback Loop:

// Add "Report Error" button to SOAP notes
const reportAIError = async (recordId: string, errorType: string, description: string) => {
  await supabase.from('ai_error_reports').insert({
    record_id: recordId,
    error_type: errorType,  // 'hallucination', 'misrecognition', 'missing_info'
    description,
    reported_by: currentUser.id,
    timestamp: new Date(),
  });

  // Alert engineering team if critical
  if (errorType === 'hallucination') {
    sendSlackAlert(`Critical AI error reported in record ${recordId}`);
  }
};

Error Categories:

Hallucination: AI invented information not in dictation
Misrecognition: Whisper transcribed word incorrectly
Missing Information: AI omitted content from dictation
Inappropriate Suggestion: Clinical insight not relevant/safe

Training & Onboarding

Staff Training Checklist

Best Practice Documentation

Create Practice-Specific Guide:

# [Practice Name] AI Usage Guidelines

## Recording Dictations
- Always record in Exam Room 1 or 2 (quietest)
- Use iPad, not iPhone (better mic)
- Speak directly into bottom edge of iPad

## Common Misrecognitions at Our Practice
- "Rhodesian Ridgeback" → Spell every time
- "Carprofen" → Say "Rimadyl" instead
- Dr. Smith's accent: Speak 10% slower for better accuracy

## Custom Abbreviations
- Say "rabies vaccination" not "rabies vax" (AI doesn't recognize "vax")
- Say "heartworm prevention" not "HW prev"

## Template Preferences
- Wellness exams: Standard SOAP (Concise)
- Dental procedures: Dental - Canine template (Detailed)
- Surgeries: Surgery Report template (Detailed)

Troubleshooting

AI Keeps Hallucinating Vitals

Symptom: SOAP notes include temperature/heart rate not mentioned in dictationRoot Cause: Prompt doesn’t emphasize “only use stated information”Solution: Update system prompt:

CRITICAL: If vitals are not mentioned in the dictation,
write "Vitals: Not recorded" in the Objective section.
Do NOT invent plausible values.

Whisper Transcribes Breed Names Wrong

Symptom: “Rhodesian Ridgeback” becomes “Rosy and Ridgeback”Solution 1: Spell breed name in dictation

"This is a Rhodesian Ridgeback, spelled R-H-O-D-E-S-I-A-N Ridgeback."

Solution 2: Add to Whisper prompt parameter

prompt: "Breed names: Rhodesian Ridgeback, Weimaraner, Shih Tzu."

Solution 3: Post-processing correction

corrections: { 'Rosy and Ridgeback': 'Rhodesian Ridgeback' }

Clinical Insights All Low Relevance

Symptom: Insights marked “Not Relevant” by vets >50% of the timeCauses:

Insights too generic (“Consider blood work”)
Not species-specific
Suggests tests practice doesn’t offer

Solution: Improve prompt specificity

Generate insights specific to [species] medicine.
Only suggest diagnostics available at general practice clinics.
Be specific ("Feline diabetes panel" not "blood work").

Luna AI Books Wrong Appointments

Symptom: Appointments scheduled at unavailable timesCause: check_availability function returning incorrect dataDebug:

Test function URL directly: curl https://your-api.com/api/appointments/available?date=2026-03-15
Verify response format matches Retell expectations
Check database query for off-by-one errors (timezone issues)

Solution: Fix backend endpoint logic

Summary Checklist

Next Steps

SOAP Generation

Implement AI dictation workflow

Clinical Insights

Configure diagnosis suggestions

Voice Assistant

Set up Luna AI phone system

Overview

Return to AI & ML overview

AI Features

Integration

AI Best Practices

AI Best Practices

Clinical Safety

Human-in-the-Loop Workflow

Review Checklists

Dictation Best Practices

Recording Technique

Content Structure

Medical Terminology Tips

Cost Optimization

Token Management

Usage Monitoring

Accuracy Optimization

Prompt Engineering

Model Selection

Temperature Settings

Quality Assurance

Regular Audits

Error Reporting

Training & Onboarding

Staff Training Checklist

Best Practice Documentation

Troubleshooting

Summary Checklist

Next Steps

SOAP Generation

Clinical Insights

Voice Assistant

Overview

Build docs developers (and LLMs) love

AI Features

Integration

​AI Best Practices

​Clinical Safety

​Human-in-the-Loop Workflow

​Review Checklists

​Dictation Best Practices

​Recording Technique

​Content Structure

​Medical Terminology Tips

​Cost Optimization

​Token Management

​Usage Monitoring

​Accuracy Optimization

​Prompt Engineering

​Model Selection

​Temperature Settings

​Quality Assurance

​Regular Audits

​Error Reporting

​Training & Onboarding

​Staff Training Checklist

​Best Practice Documentation

​Troubleshooting

​Summary Checklist

​Next Steps

SOAP Generation

Clinical Insights

Voice Assistant

Overview

Build docs developers (and LLMs) love

AI Best Practices

Clinical Safety

Human-in-the-Loop Workflow

Review Checklists

Dictation Best Practices

Recording Technique

Content Structure

Medical Terminology Tips

Cost Optimization

Token Management

Usage Monitoring

Accuracy Optimization

Prompt Engineering

Model Selection

Temperature Settings

Quality Assurance

Regular Audits

Error Reporting

Training & Onboarding

Staff Training Checklist

Best Practice Documentation

Troubleshooting

Summary Checklist

Next Steps