Skip to main content
Invoice OCR supports multiple vision models through OpenRouter. This guide helps you choose the right model based on accuracy, speed, and cost requirements.

Default Model

Invoice OCR defaults to google/gemini-2.0-flash when OPENROUTER_MODEL is not set. Why Gemini 2.0 Flash?
  • Fast processing (typically 1-3 seconds per invoice)
  • Excellent accuracy for structured documents
  • Cost-effective at ~0.0010.001-0.003 per invoice
  • Native multimodal support (no image-to-text conversion)
  • Large context window (1M tokens)

Model Comparison

google/gemini-2.0-flash

Best for: Production use, high volume
  • Speed: Fast (1-3s)
  • Accuracy: High
  • Cost: 0.0010.001-0.003/invoice
  • Context: 1M tokens
Ideal for:
  • Processing hundreds of invoices
  • Real-time API integrations
  • Cost-sensitive applications

openai/gpt-4o

Best for: Maximum accuracy
  • Speed: Medium (3-5s)
  • Accuracy: Highest
  • Cost: 0.010.01-0.03/invoice
  • Context: 128k tokens
Ideal for:
  • Complex invoices with handwriting
  • Multi-page PDFs with tables
  • Critical financial documents

anthropic/claude-3.5-sonnet

Best for: Complex reasoning
  • Speed: Medium (3-5s)
  • Accuracy: Very High
  • Cost: 0.0150.015-0.04/invoice
  • Context: 200k tokens
Ideal for:
  • Invoices with complex tax calculations
  • Multi-currency documents
  • Detailed reconciliation needs

openai/gpt-4o-mini

Best for: Budget-conscious development
  • Speed: Very Fast (under 2s)
  • Accuracy: Good
  • Cost: 0.00050.0005-0.001/invoice
  • Context: 128k tokens
Ideal for:
  • Development and testing
  • Simple invoices
  • High-volume, low-stakes use cases

Performance Matrix

ModelSpeedAccuracyCost (avg)Best For
google/gemini-2.0-flash⚡⚡⚡⭐⭐⭐⭐$0.002Production default
openai/gpt-4o⚡⚡⭐⭐⭐⭐⭐$0.020Maximum accuracy
openai/gpt-4o-mini⚡⚡⚡⚡⭐⭐⭐$0.001Budget/testing
anthropic/claude-3.5-sonnet⚡⚡⭐⭐⭐⭐⭐$0.025Complex reasoning
anthropic/claude-3-opus⭐⭐⭐⭐⭐$0.040Highest accuracy
anthropic/claude-3-haiku⚡⚡⚡⚡⭐⭐⭐$0.001Fast & cheap
Costs are approximate and based on typical invoice processing (1-2 page documents with 500-2000 tokens of output). Actual costs vary based on document complexity, page count, and output verbosity.

Choosing a Model

By Use Case

Recommended: google/gemini-2.0-flashWhy:
  • Processes 1000+ invoices/day cost-effectively
  • Fast enough for real-time user experiences
  • High accuracy for standard invoices
  • Reliable and well-supported
Configuration:
.env.local
OPENROUTER_MODEL=google/gemini-2.0-flash
Recommended: openai/gpt-4o or anthropic/claude-3.5-sonnetWhy:
  • Highest accuracy for complex invoices
  • Better handling of edge cases
  • More robust number extraction
  • Strong reasoning for tax calculations
Configuration:
.env.local
OPENROUTER_MODEL=openai/gpt-4o
Recommended: openai/gpt-4o-miniWhy:
  • Very low cost during development
  • Fast iteration cycles
  • Good enough for testing workflows
  • Upgrade to production model when deploying
Configuration:
.env.local
OPENROUTER_MODEL=openai/gpt-4o-mini
Recommended: anthropic/claude-3.5-sonnetWhy:
  • 200k token context window
  • Excellent multi-page reasoning
  • Strong table extraction
  • Detailed reconciliation capabilities
Configuration:
.env.local
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
Recommended: openai/gpt-4oWhy:
  • Best OCR capabilities
  • Handles blurry or rotated images
  • Good with handwritten notes
  • Robust to image quality issues
Configuration:
.env.local
OPENROUTER_MODEL=openai/gpt-4o

By Document Type

Document TypeRecommended ModelReason
Standard tax invoicesgemini-2.0-flashFast, accurate, cost-effective
GST invoices (India)gemini-2.0-flashNative handling of complex schemas
Multi-page invoicesclaude-3.5-sonnetLarge context, strong reasoning
Handwritten invoicesgpt-4oBest OCR capabilities
Scanned PDFsgpt-4oRobust to image quality
Simple receiptsgpt-4o-miniOverkill to use expensive models
International invoicesclaude-3.5-sonnetMulti-language, multi-currency

Cost Optimization

Strategies

1

Use Cheaper Models for Simple Invoices

Route simple invoices to gpt-4o-mini and complex ones to gpt-4o:
const model = invoice.pageCount > 2 || invoice.hasHandwriting
  ? 'openai/gpt-4o'
  : 'openai/gpt-4o-mini';
2

Cache Annotations for Multi-Page PDFs

Reuse OpenRouter’s file parsing to avoid re-parsing costs:
// First request
const response = await fetch('/api/ocr-structured-v4', {
  method: 'POST',
  body: JSON.stringify({ pdfBase64: '...' })
});
const result = await response.json();

// Store annotations for reuse
const annotations = result._annotations;

// Subsequent requests with same PDF
const response2 = await fetch('/api/ocr-structured-v4', {
  method: 'POST',
  body: JSON.stringify({
    pdfBase64: '...',
    annotations // Skips re-parsing
  })
});
3

Use Gemini for Production

gemini-2.0-flash offers the best cost/performance ratio:
  • 5-10x cheaper than GPT-4o
  • Similar accuracy for structured documents
  • Faster processing
4

Monitor Usage

Track costs per model in OpenRouter dashboard:
  1. Go to openrouter.ai/activity
  2. Filter by model
  3. Identify expensive requests
  4. Optimize or switch models

Cost Estimation

Example: Processing 1000 invoices/month
ModelCost per InvoiceMonthly Cost
gpt-4o-mini$0.001$1
gemini-2.0-flash$0.002$2
gemini-pro$0.005$5
gpt-4o$0.020$20
claude-3.5-sonnet$0.025$25
claude-3-opus$0.040$40
Hybrid approach: Use gemini-2.0-flash for 90% of invoices and gpt-4o for the 10% that fail validation. This gives you high accuracy at ~$4/month for 1000 invoices.

Per-Request Model Override

You can override the default model on a per-request basis:
const response = await fetch('/api/ocr-structured-v4', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    imageBase64: '...',
    model: 'openai/gpt-4o' // Override
  })
});
This allows you to:
  • Use different models for different invoice types
  • Retry failed extractions with a more powerful model
  • A/B test models for accuracy

Model Availability

Check available models in real-time:
1

Via OpenRouter Dashboard

Browse all models at openrouter.ai/modelsFilter by:
  • Vision support
  • Context length
  • Pricing
  • Provider
2

Via API Endpoint

Invoice OCR includes a models endpoint:
curl http://localhost:3000/api/models
Returns:
{
  "models": [
    {
      "id": "google/gemini-2.0-flash",
      "name": "Gemini 2.0 Flash",
      "contextLength": 1000000,
      "pricing": {
        "promptPer1k": 0.025,
        "completionPer1k": 0.1,
        "imagePer1k": 0.025
      },
      "isVision": true
    }
  ],
  "cached": true,
  "count": 47
}
3

Via UI

The Invoice OCR web interface displays available models in the model selector dropdown.

PDF-Specific Considerations

PDF Engine Selection

The PDF parsing engine can impact model performance:
.env.local
OPENROUTER_PDF_ENGINE=pdf-text  # Default: fast text extraction
# OPENROUTER_PDF_ENGINE=mistral-ocr  # For scanned PDFs
# OPENROUTER_PDF_ENGINE=native  # Provider's native handling
Recommendations:
  • pdf-text: Use with any model for digital PDFs
  • mistral-ocr: Use with gemini-2.0-flash or gpt-4o for scanned PDFs
  • native: Let the model provider handle PDF parsing

Multi-Page PDFs

Models with larger context windows handle multi-page invoices better:
PagesRecommended ModelContext Needed
1-2Any model~4k tokens
3-5gemini-2.0-flash~20k tokens
6-10claude-3.5-sonnet~50k tokens
10+gemini-2.0-flash~100k+ tokens

Troubleshooting

Error: “Model not found” or “Invalid model”Solutions:
  • Check model ID spelling (case-sensitive)
  • Verify model is available: openrouter.ai/models
  • Try the /api/models endpoint to see available vision models
  • Use a known-good model like google/gemini-2.0-flash
Problem: Model not extracting data correctlySolutions:
  1. Try a more powerful model (gpt-4o or claude-3.5-sonnet)
  2. Check image quality (resolution, rotation, clarity)
  3. Use mistral-ocr PDF engine for scanned documents
  4. Review the raw OCR output via /api/ocr to diagnose issues
Problem: API requests taking too longSolutions:
  • Switch to faster models (gpt-4o-mini, gemini-2.0-flash)
  • Reduce PDF page count (extract relevant pages only)
  • Use pdf-text engine instead of mistral-ocr
  • Check OpenRouter status: status.openrouter.ai
Problem: API costs higher than expectedSolutions:
  • Switch default model to gemini-2.0-flash or gpt-4o-mini
  • Implement caching for repeated PDFs
  • Monitor usage per model in OpenRouter dashboard
  • Set credit limits on API keys

Next Steps

Environment Variables

Configure OPENROUTER_MODEL in .env.local

API Reference

Learn how to override models per-request

Quick Start

Start processing invoices

OpenRouter Dashboard

Monitor usage and costs

Build docs developers (and LLMs) love