OpenAI Integration
Paw & Care uses OpenAI’s GPT-4 and Whisper models for AI-powered clinical documentation. This guide covers API setup, configuration, rate limiting, and cost optimization.
Prerequisites
Billing Setup
Add payment method in Settings → Billing (required for API access)
API Key Generation
Create API key at Settings → API keys (starts with sk-proj-...)
Usage Limits
Set monthly spending cap in Settings → Limits (recommended: $50-100/month per vet)
Protect Your API Key Never commit API keys to version control or expose in client-side code. Use environment variables and backend proxying.
Environment Setup
Backend Configuration
Add OpenAI API key to server .env file:
# OpenAI Configuration
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OPENAI_ORG_ID=org-xxxxxxxxxxxxxxxxxxxxxxxx # Optional
# API Base URL (leave default unless using proxy)
OPENAI_API_BASE=https://api.openai.com/v1
Initialize OpenAI Client
In server/index.ts:
import OpenAI from 'openai';
import dotenv from 'dotenv';
dotenv.config();
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
organization: process.env.OPENAI_ORG_ID, // Optional
});
// Health check endpoint
app.get('/api/health', async (req, res) => {
res.json({
status: 'ok',
services: {
openai: process.env.OPENAI_API_KEY ? 'configured' : 'missing',
},
});
});
Test API connection with: curl http://localhost:3000/api/health
GPT-4 Configuration
Model Selection
Best for : SOAP notes, clinical insights, billing extractionPricing (as of 2024):
Input: $0.15 per 1M tokens
Output: $0.60 per 1M tokens
Characteristics :
70% cheaper than GPT-4 standard
Faster response (8-12s vs 15-20s)
128K context window
Good structured output (JSON)
Use Case :const completion = await openai . chat . completions . create ({
model: 'gpt-4o-mini' ,
messages: [
{ role: 'system' , content: systemPrompt },
{ role: 'user' , content: transcription },
],
temperature: 0.3 ,
max_tokens: 2000 ,
});
Best for : Complex clinical reasoning, rare diseasesPricing :
Input: $10 per 1M tokens
Output: $30 per 1M tokens
Characteristics :
Highest accuracy
Better medical knowledge
Slower (15-20s)
128K context window
When to Use : Specialist consultations, second opinionsBest for : Simple text formatting, low-budget deploymentsPricing :
Input: $0.50 per 1M tokens
Output: $1.50 per 1M tokens
Characteristics :
Cheapest option
Fast (3-5s)
Lower quality for medical terminology
16K context window
Not Recommended : Medical accuracy insufficient for clinical notes
Paw & Care Default : gpt-4o-mini provides best balance of cost, speed, and quality
Temperature Settings
Controls randomness/creativity:
// Medical documentation (factual, consistent)
temperature : 0.3
// Clinical insights (some creativity for suggestions)
temperature : 0.4
// Marketing copy (creative, varied)
temperature : 0.8
For SOAP Notes :
const completion = await openai . chat . completions . create ({
model: 'gpt-4o-mini' ,
temperature: 0.3 , // Low temp = more deterministic
max_tokens: 2000 ,
messages: [ ... ],
});
Token Limits
System Prompts
Critical for quality output. Example SOAP generation prompt:
const systemPrompt = `You are an expert veterinary medical scribe AI.
Generate structured clinical notes from the following veterinary dictation.
Be ${ detailLevel === 'concise' ? 'concise and brief' : 'thorough and detailed' } .
Template: " ${ templateName } "
Sections to fill:
${ sectionGuide }
${ patientName ? `Patient: ${ patientName } ( ${ species } , ${ breed } )` : '' }
Return a JSON object with exactly these keys: ${ returnKeys . join ( ', ' ) } .
Each value should be a well-formatted string with appropriate clinical detail.
If the transcription doesn't contain information for a section, write "No [section name] information provided."
Only return valid JSON, no markdown or explanation.` ;
Prompt Engineering Tips :
Be explicit about output format (“Only return valid JSON”)
Provide examples of desired output (few-shot learning)
Specify medical terminology style (“Use clinical terms like ‘otitis externa’ not ‘ear infection’”)
Include patient context (species, breed) for better accuracy
Whisper Configuration
Audio Transcription
Whisper-1 model for speech-to-text:
app.post('/api/ai/transcribe', async (req, res) => {
const { audio, mimeType } = req.body;
// Decode base64 and write to temp file
const buffer = Buffer.from(audio, 'base64');
const ext = mimeType.includes('mp4') ? 'mp4' : 'webm';
const tmpPath = path.join(os.tmpdir(), `dictation-${Date.now()}.${ext}`);
fs.writeFileSync(tmpPath, buffer);
try {
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream(tmpPath),
model: 'whisper-1',
language: 'en', // Improves accuracy for English
response_format: 'text', // 'json' | 'text' | 'srt' | 'vtt'
// prompt: "Medical terminology: otitis, auscultation..." // Optional hint
});
return res.json({ transcription });
} finally {
fs.unlinkSync(tmpPath); // Clean up
}
});
Pricing
Whisper-1 : $0.006 per minute of audio
Examples :
2-minute dictation: $0.012
5-minute dictation: $0.030
10-minute dictation: $0.060
Whi sper cost typically 10-50x more than GPT-4 token cost per SOAP note
Supported formats:
mp3, mp4, mpeg, mpga
m4a, wav, webm
File Size Limit : 25 MB
Recommended Format : webm with Opus codec (best compression for voice)
// Client-side recording
const recorder = new MediaRecorder ( stream , {
mimeType: 'audio/webm;codecs=opus' ,
});
Language & Prompt Hints
const transcription = await openai . audio . transcriptions . create ({
file: audioStream ,
model: 'whisper-1' ,
language: 'en' , // ISO-639-1 code
prompt: "Medical veterinary dictation. Common terms: auscultation, palpation, otitis externa, Bordetella." ,
});
Prompt Parameter : Provide medical terminology hints to improve accuracy of rare vet terms (optional but helpful)
Rate Limiting
OpenAI API Limits
Tier 1 (New Accounts)
Tier 2
Tier 3
Requirements : $5 spentLimits :
500 requests per minute (RPM)
30,000 tokens per minute (TPM)
200 requests per day (RPD)
Sufficient For : 1-2 veterinarians, light usageRequirements : $50 spent + 7 daysLimits :
5,000 RPM
450,000 TPM
10,000 RPD
Sufficient For : 5-10 veterinariansRequirements : $100 spent + 7 daysLimits :
10,000 RPM
2,000,000 TPM
No daily limit
Sufficient For : 20+ veterinarians, busy practices
Backend Rate Limiter
Implement practice-level rate limiting:
import rateLimit from 'express-rate-limit';
const openaiLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute per practice
message: { error: 'Too many AI requests. Please wait a moment and try again.' },
standardHeaders: true,
legacyHeaders: false,
});
app.use('/api/ai/', openaiLimiter);
Retry Logic
Handle rate limit errors gracefully:
async function callOpenAIWithRetry ( fn : () => Promise < any >, maxRetries = 3 ) {
for ( let i = 0 ; i < maxRetries ; i ++ ) {
try {
return await fn ();
} catch ( error : any ) {
if ( error . status === 429 && i < maxRetries - 1 ) {
// Rate limited, wait and retry with exponential backoff
const waitTime = Math . pow ( 2 , i ) * 1000 ; // 1s, 2s, 4s
await new Promise ( resolve => setTimeout ( resolve , waitTime ));
continue ;
}
throw error ; // Not rate limit, or max retries exceeded
}
}
}
// Usage
const completion = await callOpenAIWithRetry (() =>
openai . chat . completions . create ({ ... })
);
Cost Optimization
Token Management
Dynamic Prompt Sizing
Adjust prompt length based on transcription length: const systemPrompt = transcription . length > 1000
? getDetailedPrompt () // Full context
: getConcisePrompt (); // Minimal tokens
Caching Responses
Cache identical requests (rare for transcriptions): const cacheKey = hashContent ( transcription + templateId );
const cached = await redis . get ( cacheKey );
if ( cached ) return JSON . parse ( cached );
Batch Processing
Process multiple sections in one API call: // Instead of 4 API calls (one per SOAP section)
// Make 1 API call returning all 4 sections as JSON
Browser SpeechRecognition
Use free browser API for live transcription, only call Whisper if needed: if ( liveTranscriptRef . current . trim ()) {
setTranscription ( liveTranscriptRef . current );
// Skip Whisper API call, save $0.006/min
}
Monitoring Costs
let monthlyTokenUsage = { input: 0, output: 0, cost: 0 };
const completion = await openai.chat.completions.create({ ... });
const inputTokens = completion.usage?.prompt_tokens || 0;
const outputTokens = completion.usage?.completion_tokens || 0;
const inputCost = (inputTokens / 1_000_000) * 0.15; // gpt-4o-mini input pricing
const outputCost = (outputTokens / 1_000_000) * 0.60; // gpt-4o-mini output pricing
monthlyTokenUsage.input += inputTokens;
monthlyTokenUsage.output += outputTokens;
monthlyTokenUsage.cost += inputCost + outputCost;
// Log to database for billing
await supabase.from('api_usage').insert({
practice_id: practiceId,
service: 'openai-gpt4',
tokens_input: inputTokens,
tokens_output: outputTokens,
cost_usd: inputCost + outputCost,
timestamp: new Date(),
});
Usage Alerts
Set spending cap and alert thresholds:
if ( monthlyTokenUsage . cost > 80 ) { // 80% of $100 budget
sendAlertEmail ({
to: '[email protected] ' ,
subject: 'OpenAI API usage at 80%' ,
body: `Current month: $ ${ monthlyTokenUsage . cost . toFixed ( 2 ) } ` ,
});
}
Error Handling
Common Errors
401 Unauthorized
429 Rate Limit
400 Bad Request
500 Server Error
Cause : Invalid API keyResponse :{
"error" : {
"message" : "Incorrect API key provided" ,
"type" : "invalid_request_error" ,
"code" : "invalid_api_key"
}
}
Solution : Check OPENAI_API_KEY in .envCause : Exceeded RPM/TPM limitsResponse :{
"error" : {
"message" : "Rate limit reached" ,
"type" : "rate_limit_error" ,
"code" : "rate_limit_exceeded"
}
}
Solution : Implement exponential backoff retryCause : Invalid parameters (e.g., model name typo)Response :{
"error" : {
"message" : "model 'gpt-4-mini' does not exist" ,
"type" : "invalid_request_error"
}
}
Solution : Verify model name, check API documentationCause : OpenAI service outageResponse :{
"error" : {
"message" : "The server had an error processing your request" ,
"type" : "server_error"
}
}
Solution : Retry with exponential backoff, check status.openai.com
Error Handling Pattern
try {
const completion = await openai.chat.completions.create({ ... });
return res.json({ soap: parsedResponse });
} catch (error: any) {
console.error('[OpenAI Error]', error);
// Structured error response
if (error.status === 429) {
return res.status(429).json({
error: 'AI service is busy. Please try again in a moment.',
retryAfter: 60, // seconds
});
}
if (error.status === 401) {
return res.status(500).json({
error: 'AI service configuration error. Contact support.',
});
}
return res.status(500).json({
error: error.message || 'AI service unavailable.',
});
}
Testing
Test OpenAI Connection
# Start backend server
npm run dev:server
# Test transcription endpoint
curl -X POST http://localhost:3000/api/ai/transcribe \
-H "Content-Type: application/json" \
-d '{
"audio": "<base64-encoded-audio>",
"mimeType": "audio/webm"
}'
# Test SOAP generation
curl -X POST http://localhost:3000/api/ai/generate-soap \
-H "Content-Type: application/json" \
-d '{
"transcription": "Max is a 5 year old beagle...",
"templateName": "Standard SOAP",
"sectionKeys": ["subjective", "objective", "assessment", "plan"]
}'
Unit Tests
import { describe, it, expect } from 'vitest';
import OpenAI from 'openai';
describe('OpenAI Integration', () => {
it('should transcribe audio', async () => {
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream('test-audio.webm'),
model: 'whisper-1',
});
expect(transcription).toContain('test');
});
it('should generate SOAP notes', async () => {
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'user', content: 'Generate SOAP note for...' },
],
});
const response = JSON.parse(completion.choices[0].message.content);
expect(response).toHaveProperty('subjective');
expect(response).toHaveProperty('objective');
});
});
Security Best Practices
Critical Security Rules :
❌ Never expose API key in client-side code
✅ Always proxy through backend server
✅ Set monthly spending limits in OpenAI dashboard
✅ Rotate API keys every 90 days
✅ Use environment variables for all secrets
✅ Implement rate limiting per practice/user
✅ Log all API usage for audit trail
❌ Never log full API responses (may contain PHI)
Next Steps
SOAP Generation Implement voice-to-SOAP workflow
Clinical Insights Generate AI diagnosis suggestions
Whisper Speech Deep dive into audio transcription
Best Practices Optimize AI accuracy and cost