Overview
Invoice OCR uses OpenRouter as the LLM gateway, providing access to models from OpenAI, Google, Anthropic, and others through a unified API. OpenRouter handles:- Model routing: Single endpoint for 100+ models
- PDF parsing: Built-in plugins for document extraction
- Caching: Annotation system to avoid re-parsing
- Fallbacks: Automatic retry with alternate providers
API Endpoint
Base URL:https://openrouter.ai/api/v1/chat/completions
Compatibility: OpenAI-compatible chat completions format
Authentication
Location:app/api/ocr-structured-v4/route.ts:215-221
- Sign up at openrouter.ai
- Generate API key from dashboard
- Add to
.env.local:
Request Headers
Location:app/api/ocr-structured-v4/route.ts:282-288
Required Headers
| Header | Value | Purpose |
|---|---|---|
Content-Type | application/json | Standard REST API |
Authorization | Bearer ${OPENROUTER_API_KEY} | Authentication |
Optional Headers
| Header | Environment Variable | Default | Purpose |
|---|---|---|---|
HTTP-Referer | OPENROUTER_SITE_URL | http://localhost:3000 | Usage tracking, required for some models |
X-Title | OPENROUTER_APP_NAME | Invoice OCR | App identifier in OpenRouter dashboard |
HTTP-Referer for attribution.
Request Payload
Location:app/api/ocr-structured-v4/route.ts:228-254
Basic Structure
Model Selection
Location:app/api/ocr-structured-v4/route.ts:223-224
| Model ID | Provider | Cost (per 1M tokens) | Best For |
|---|---|---|---|
google/gemini-2.5-flash | ~$0.07 input | Default: Fast, accurate, cheap | |
google/gemini-2.0-flash | ~$0.05 input | Legacy fallback | |
openai/gpt-4o-mini | OpenAI | ~$0.15 input | Structured output |
openai/o3-mini | OpenAI | ~$1.00 input | Complex reasoning |
anthropic/claude-3.5-sonnet | Anthropic | ~$3.00 input | High-quality extraction |
Temperature
Location:app/api/ocr-structured-v4/route.ts:230
Response Format
Location:app/api/ocr-structured-v4/route.ts:231
app/api/ocr-structured-v4/route.ts:309-351) strips markdown anyway.
File Attachments
Images
Location:app/api/ocr-structured-v4/route.ts:249
app/api/ocr-structured-v4/route.ts:25-29
PDFs
Location:app/api/ocr-structured-v4/route.ts:240-247
- Data URL:
data:application/pdf;base64,... - Public URL:
https://example.com/invoice.pdf
PDF Plugins
Location:app/api/ocr-structured-v4/route.ts:268-277
Configuration
Engine Types
| Engine | Method | Best For | Cost |
|---|---|---|---|
pdf-text | Text extraction | Digital PDFs with selectable text | $0.001/page |
mistral-ocr | Mistral Pixtral OCR | Scanned PDFs, images embedded in PDF | $0.01/page |
native | Model’s built-in | Models with native PDF support (GPT-4o, Claude 3.5) | Varies |
pdf-text (fastest, cheapest for most invoices)
When to use mistral-ocr:
- Scanned/photographed documents
- Poor-quality text extraction with
pdf-text - Handwritten annotations
Custom Plugin Override
Location:app/api/ocr-structured-v4/route.ts:20-21
Annotations (Caching)
Location:app/api/ocr-structured-v4/route.ts:256-265
Purpose
When re-processing the same PDF with different prompts, OpenRouter can skip re-parsing if you pass theannotations from the previous response.
Usage
Example Flow
First request (no annotations):Response Handling
Success Response
Location:app/api/ocr-structured-v4/route.ts:299-306
Error Response
Location:app/api/ocr-structured-v4/route.ts:291-296
| Status | Cause | Solution |
|---|---|---|
401 | Invalid API key | Check OPENROUTER_API_KEY in .env.local |
402 | Insufficient credits | Add credits at openrouter.ai |
429 | Rate limit exceeded | Wait or upgrade plan |
502 | Model unavailable | Retry or switch model |
JSON Coercion
Location:app/api/ocr-structured-v4/route.ts:309-351
Even with response_format: {type: "json_object"}, some models may return:
- Markdown code fences:
```json\n{...}\n``` - Union types:
"price_mode": "WITH_TAX" | "WITHOUT_TAX" - Invalid values:
NaN,Infinity
Cost Optimization
Token Usage
System prompt: ~2,600 characters = ~650 tokens Schema: ~4,000 characters = ~1,000 tokens Invoice image: ~1,000-2,000 tokens (depends on resolution) Response: ~2,000-5,000 tokens (depends on items) Total per invoice: ~5,000-9,000 tokens Estimated costs (gemini-2.5-flash @ 0.30/1M output):- Input: 6,000 tokens × 0.00042**
- Output: 3,000 tokens × 0.00090**
- Total per invoice: ~$0.0013 (0.13 cents)
Batching
For processing multiple invoices, send requests in parallel:- 200 requests/minute
- 1M tokens/day
Model Selection Strategy
Development/Testing:- Use
google/gemini-2.5-flash(fast, cheap)
- Use
openai/gpt-4o-minifor critical invoices - Fall back to Gemini for simple layouts
- Use
anthropic/claude-3.5-sonnetfor:- Multi-page invoices with inconsistent layouts
- Handwritten annotations
- Tables spanning pages
Monitoring
OpenRouter Dashboard
Location: openrouter.ai/activity Metrics:- Requests per model
- Token usage
- Error rates
- Cost breakdown
Application-Level Logging
Add to API routes:- Which models perform best
- Average processing time
- Reconciliation success rate
Security
API Key Protection
Never expose in frontend:Rate Limiting
Add middleware to API routes:Input Validation
Location:app/api/ocr-structured-v4/route.ts:199-205
- File size (under 10MB)
- MIME type (image/* or application/pdf)
- Model ID (whitelist allowed models)
Testing
Mock Responses
For unit tests, mock OpenRouter:Integration Tests
Use test API key:public/test-invoices/.
Troubleshooting
Issue: Model returns invalid JSON
Symptoms:Model did not return valid JSON error
Causes:
- Model doesn’t support
response_format: {type: "json_object"} - System prompt not clear enough
- Invoice too complex for model
- Check model capabilities: OpenRouter Models
- Add
"Output ONLY the JSON object, no commentary"to user message - Switch to a more capable model (e.g., GPT-4o)
Issue: PDF parsing fails
Symptoms: Empty or garbled text extraction Causes:- Scanned PDF (no text layer)
- Complex layout (tables, multi-column)
- Non-English characters
- Switch to
OPENROUTER_PDF_ENGINE=mistral-ocr - Try model with native PDF support:
openai/gpt-4o - Pre-process PDF with OCR tool before upload
Issue: High costs
Symptoms: Unexpected charges in dashboard Causes:- Using expensive models for simple invoices
- Re-parsing same PDF without annotations
- Large images not resized
- Default to
gemini-2.5-flash, upgrade only when needed - Implement annotation caching (see above)
- Resize images to max 1200px width before upload
Next Steps
- OCR Processing Flow - Full pipeline
- Reconciliation Logic - Post-processing
- Environment Setup - Configuration guide
