OpenAI Integration

Paw & Care uses OpenAI’s GPT-4 and Whisper models for AI-powered clinical documentation. This guide covers API setup, configuration, rate limiting, and cost optimization.

Prerequisites

OpenAI Account

Create account at platform.openai.com

Billing Setup

Add payment method in Settings → Billing (required for API access)

API Key Generation

Create API key at Settings → API keys (starts with sk-proj-...)

Usage Limits

Set monthly spending cap in Settings → Limits (recommended: $50-100/month per vet)

Protect Your API KeyNever commit API keys to version control or expose in client-side code. Use environment variables and backend proxying.

Environment Setup

Backend Configuration

Add OpenAI API key to server .env file:

# OpenAI Configuration
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OPENAI_ORG_ID=org-xxxxxxxxxxxxxxxxxxxxxxxx  # Optional

# API Base URL (leave default unless using proxy)
OPENAI_API_BASE=https://api.openai.com/v1

Initialize OpenAI Client

In server/index.ts:

import OpenAI from 'openai';
import dotenv from 'dotenv';

dotenv.config();

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  organization: process.env.OPENAI_ORG_ID,  // Optional
});

// Health check endpoint
app.get('/api/health', async (req, res) => {
  res.json({
    status: 'ok',
    services: {
      openai: process.env.OPENAI_API_KEY ? 'configured' : 'missing',
    },
  });
});

Test API connection with:

curl http://localhost:3000/api/health

GPT-4 Configuration

Model Selection

gpt-4o-mini (Recommended)
gpt-4-turbo
gpt-3.5-turbo

Best for: SOAP notes, clinical insights, billing extractionPricing (as of 2024):

Input: $0.15 per 1M tokens
Output: $0.60 per 1M tokens

Characteristics:

70% cheaper than GPT-4 standard
Faster response (8-12s vs 15-20s)
128K context window
Good structured output (JSON)

Use Case:

const completion = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: transcription },
  ],
  temperature: 0.3,
  max_tokens: 2000,
});

Paw & Care Default: gpt-4o-mini provides best balance of cost, speed, and quality

Temperature Settings

Controls randomness/creativity:

// Medical documentation (factual, consistent)
temperature: 0.3

// Clinical insights (some creativity for suggestions)
temperature: 0.4

// Marketing copy (creative, varied)
temperature: 0.8

For SOAP Notes:

const completion = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  temperature: 0.3,  // Low temp = more deterministic
  max_tokens: 2000,
  messages: [...],
});

Token Limits

Input Tokens
Output Tokens

Context Window (gpt-4o-mini): 128,000 tokensTypical Usage:

System prompt: 200-500 tokens
Transcription (5 min): 600-1,000 tokens
Patient context: 100-200 tokens
Total input: ~1,000-1,700 tokens

Cost:

0.15 per 1M tokens = **

0.00015-0.00025 per SOAP note**

System Prompts

Critical for quality output. Example SOAP generation prompt:

const systemPrompt = `You are an expert veterinary medical scribe AI.
Generate structured clinical notes from the following veterinary dictation.

Be ${detailLevel === 'concise' ? 'concise and brief' : 'thorough and detailed'}.

Template: "${templateName}"
Sections to fill:
${sectionGuide}

${patientName ? `Patient: ${patientName} (${species}, ${breed})` : ''}

Return a JSON object with exactly these keys: ${returnKeys.join(', ')}.
Each value should be a well-formatted string with appropriate clinical detail.
If the transcription doesn't contain information for a section, write "No [section name] information provided."

Only return valid JSON, no markdown or explanation.`;

Prompt Engineering Tips:

Be explicit about output format (“Only return valid JSON”)
Provide examples of desired output (few-shot learning)
Specify medical terminology style (“Use clinical terms like ‘otitis externa’ not ‘ear infection’”)
Include patient context (species, breed) for better accuracy

Whisper Configuration

Audio Transcription

Whisper-1 model for speech-to-text:

app.post('/api/ai/transcribe', async (req, res) => {
  const { audio, mimeType } = req.body;

  // Decode base64 and write to temp file
  const buffer = Buffer.from(audio, 'base64');
  const ext = mimeType.includes('mp4') ? 'mp4' : 'webm';
  const tmpPath = path.join(os.tmpdir(), `dictation-${Date.now()}.${ext}`);
  fs.writeFileSync(tmpPath, buffer);

  try {
    const transcription = await openai.audio.transcriptions.create({
      file: fs.createReadStream(tmpPath),
      model: 'whisper-1',
      language: 'en',  // Improves accuracy for English
      response_format: 'text',  // 'json' | 'text' | 'srt' | 'vtt'
      // prompt: "Medical terminology: otitis, auscultation..."  // Optional hint
    });

    return res.json({ transcription });
  } finally {
    fs.unlinkSync(tmpPath);  // Clean up
  }
});

Pricing

Whisper-1: $0.006 per minute of audio Examples:

2-minute dictation: $0.012
5-minute dictation: $0.030
10-minute dictation: $0.060

Whi sper cost typically 10-50x more than GPT-4 token cost per SOAP note

Audio Format Support

Supported formats:

mp3, mp4, mpeg, mpga
m4a, wav, webm

File Size Limit: 25 MB Recommended Format: webm with Opus codec (best compression for voice)

// Client-side recording
const recorder = new MediaRecorder(stream, {
  mimeType: 'audio/webm;codecs=opus',
});

Language & Prompt Hints

const transcription = await openai.audio.transcriptions.create({
  file: audioStream,
  model: 'whisper-1',
  language: 'en',  // ISO-639-1 code
  prompt: "Medical veterinary dictation. Common terms: auscultation, palpation, otitis externa, Bordetella.",
});

Prompt Parameter: Provide medical terminology hints to improve accuracy of rare vet terms (optional but helpful)

Rate Limiting

OpenAI API Limits

Tier 1 (New Accounts)
Tier 2
Tier 3

Requirements: $5 spentLimits:

500 requests per minute (RPM)
30,000 tokens per minute (TPM)
200 requests per day (RPD)

Sufficient For: 1-2 veterinarians, light usage

Backend Rate Limiter

Implement practice-level rate limiting:

import rateLimit from 'express-rate-limit';

const openaiLimiter = rateLimit({
  windowMs: 60 * 1000,  // 1 minute
  max: 100,  // 100 requests per minute per practice
  message: { error: 'Too many AI requests. Please wait a moment and try again.' },
  standardHeaders: true,
  legacyHeaders: false,
});

app.use('/api/ai/', openaiLimiter);

Retry Logic

Handle rate limit errors gracefully:

async function callOpenAIWithRetry(fn: () => Promise<any>, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error: any) {
      if (error.status === 429 && i < maxRetries - 1) {
        // Rate limited, wait and retry with exponential backoff
        const waitTime = Math.pow(2, i) * 1000;  // 1s, 2s, 4s
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }
      throw error;  // Not rate limit, or max retries exceeded
    }
  }
}

// Usage
const completion = await callOpenAIWithRetry(() =>
  openai.chat.completions.create({ ... })
);

Cost Optimization

Token Management

Dynamic Prompt Sizing

Adjust prompt length based on transcription length:

const systemPrompt = transcription.length > 1000
  ? getDetailedPrompt()  // Full context
  : getConcisePrompt();  // Minimal tokens

Caching Responses

Cache identical requests (rare for transcriptions):

const cacheKey = hashContent(transcription + templateId);
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);

Batch Processing

Process multiple sections in one API call:

// Instead of 4 API calls (one per SOAP section)
// Make 1 API call returning all 4 sections as JSON

Browser SpeechRecognition

Use free browser API for live transcription, only call Whisper if needed:

if (liveTranscriptRef.current.trim()) {
  setTranscription(liveTranscriptRef.current);
  // Skip Whisper API call, save $0.006/min
}

Monitoring Costs

let monthlyTokenUsage = { input: 0, output: 0, cost: 0 };

const completion = await openai.chat.completions.create({ ... });

const inputTokens = completion.usage?.prompt_tokens || 0;
const outputTokens = completion.usage?.completion_tokens || 0;
const inputCost = (inputTokens / 1_000_000) * 0.15;  // gpt-4o-mini input pricing
const outputCost = (outputTokens / 1_000_000) * 0.60;  // gpt-4o-mini output pricing

monthlyTokenUsage.input += inputTokens;
monthlyTokenUsage.output += outputTokens;
monthlyTokenUsage.cost += inputCost + outputCost;

// Log to database for billing
await supabase.from('api_usage').insert({
  practice_id: practiceId,
  service: 'openai-gpt4',
  tokens_input: inputTokens,
  tokens_output: outputTokens,
  cost_usd: inputCost + outputCost,
  timestamp: new Date(),
});

Usage Alerts

Set spending cap and alert thresholds:

if (monthlyTokenUsage.cost > 80) {  // 80% of $100 budget
  sendAlertEmail({
    to: '[email protected]',
    subject: 'OpenAI API usage at 80%',
    body: `Current month: $${monthlyTokenUsage.cost.toFixed(2)}`,
  });
}

Error Handling

Common Errors

401 Unauthorized
429 Rate Limit
400 Bad Request
500 Server Error

Cause: Invalid API keyResponse:

{
  "error": {
    "message": "Incorrect API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Solution: Check OPENAI_API_KEY in .env

Cause: Exceeded RPM/TPM limitsResponse:

{
  "error": {
    "message": "Rate limit reached",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Solution: Implement exponential backoff retry

Cause: Invalid parameters (e.g., model name typo)Response:

{
  "error": {
    "message": "model 'gpt-4-mini' does not exist",
    "type": "invalid_request_error"
  }
}

Solution: Verify model name, check API documentation

Cause: OpenAI service outageResponse:

{
  "error": {
    "message": "The server had an error processing your request",
    "type": "server_error"
  }
}

Solution: Retry with exponential backoff, check status.openai.com

Error Handling Pattern

try {
  const completion = await openai.chat.completions.create({ ... });
  return res.json({ soap: parsedResponse });
} catch (error: any) {
  console.error('[OpenAI Error]', error);

  // Structured error response
  if (error.status === 429) {
    return res.status(429).json({
      error: 'AI service is busy. Please try again in a moment.',
      retryAfter: 60,  // seconds
    });
  }

  if (error.status === 401) {
    return res.status(500).json({
      error: 'AI service configuration error. Contact support.',
    });
  }

  return res.status(500).json({
    error: error.message || 'AI service unavailable.',
  });
}

Testing

Test OpenAI Connection

# Start backend server
npm run dev:server

# Test transcription endpoint
curl -X POST http://localhost:3000/api/ai/transcribe \
  -H "Content-Type: application/json" \
  -d '{
    "audio": "<base64-encoded-audio>",
    "mimeType": "audio/webm"
  }'

# Test SOAP generation
curl -X POST http://localhost:3000/api/ai/generate-soap \
  -H "Content-Type: application/json" \
  -d '{
    "transcription": "Max is a 5 year old beagle...",
    "templateName": "Standard SOAP",
    "sectionKeys": ["subjective", "objective", "assessment", "plan"]
  }'

Unit Tests

import { describe, it, expect } from 'vitest';
import OpenAI from 'openai';

describe('OpenAI Integration', () => {
  it('should transcribe audio', async () => {
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    const transcription = await openai.audio.transcriptions.create({
      file: fs.createReadStream('test-audio.webm'),
      model: 'whisper-1',
    });
    expect(transcription).toContain('test');
  });

  it('should generate SOAP notes', async () => {
    const completion = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        { role: 'user', content: 'Generate SOAP note for...' },
      ],
    });
    const response = JSON.parse(completion.choices[0].message.content);
    expect(response).toHaveProperty('subjective');
    expect(response).toHaveProperty('objective');
  });
});

Security Best Practices

Critical Security Rules:

❌ Never expose API key in client-side code
✅ Always proxy through backend server
✅ Set monthly spending limits in OpenAI dashboard
✅ Rotate API keys every 90 days
✅ Use environment variables for all secrets
✅ Implement rate limiting per practice/user
✅ Log all API usage for audit trail
❌ Never log full API responses (may contain PHI)

Next Steps

SOAP Generation

Implement voice-to-SOAP workflow

Clinical Insights

Generate AI diagnosis suggestions

Whisper Speech

Deep dive into audio transcription

Best Practices

Optimize AI accuracy and cost

AI Features

Integration

OpenAI Integration

OpenAI Integration

Prerequisites

Environment Setup

Backend Configuration

Initialize OpenAI Client

GPT-4 Configuration

Model Selection

Temperature Settings

Token Limits

System Prompts

Whisper Configuration

Audio Transcription

Pricing

Audio Format Support

Language & Prompt Hints

Rate Limiting

OpenAI API Limits

Backend Rate Limiter

Retry Logic

Cost Optimization

Token Management

Monitoring Costs

Usage Alerts

Error Handling

Common Errors

Error Handling Pattern

Testing

Test OpenAI Connection

Unit Tests

Security Best Practices

Next Steps

SOAP Generation

Clinical Insights

Whisper Speech

Best Practices

Build docs developers (and LLMs) love

AI Features

Integration

​OpenAI Integration

​Prerequisites

​Environment Setup

​Backend Configuration

​Initialize OpenAI Client

​GPT-4 Configuration

​Model Selection

​Temperature Settings

​Token Limits

​System Prompts

​Whisper Configuration

​Audio Transcription

​Pricing

​Audio Format Support

​Language & Prompt Hints

​Rate Limiting

​OpenAI API Limits

​Backend Rate Limiter

​Retry Logic

​Cost Optimization

​Token Management

​Monitoring Costs

​Usage Alerts

​Error Handling

​Common Errors

​Error Handling Pattern

​Testing

​Test OpenAI Connection

​Unit Tests

​Security Best Practices

​Next Steps

SOAP Generation

Clinical Insights

Whisper Speech

Best Practices

Build docs developers (and LLMs) love

OpenAI Integration

Prerequisites

Environment Setup

Backend Configuration

Initialize OpenAI Client

GPT-4 Configuration

Model Selection

Temperature Settings

Token Limits

System Prompts

Whisper Configuration

Audio Transcription

Pricing

Audio Format Support

Language & Prompt Hints

Rate Limiting

OpenAI API Limits

Backend Rate Limiter

Retry Logic

Cost Optimization

Token Management

Monitoring Costs

Usage Alerts

Error Handling

Common Errors

Error Handling Pattern

Testing

Test OpenAI Connection

Unit Tests

Security Best Practices

Next Steps