Skip to main content

Overview

Invoice OCR provides robust PDF processing capabilities with multiple parsing engines and optimization options. PDFs are handled differently from images to extract text and structure efficiently across multiple pages.

PDF vs Image Processing

Understanding the difference between PDF and image processing:

Images

  • Direct visual analysis by AI
  • OCR reads text from pixels
  • Single page only
  • Faster processing

PDFs

  • Text extraction + visual analysis
  • Multi-page support
  • Plugin-based parsing
  • Comprehensive data extraction

PDF Input Methods

The API accepts PDFs in two formats:

Method 1: Base64 Data URLs

The most common method used by the UI:
// Source: components/ocr-uploader.tsx:158-159
const body = isPdf
  ? { pdfBase64: preview, filename: file.name || "document.pdf", model }
  : { imageBase64: preview, mimeType: file.type, model };
How it works:
  1. File is read as a data URL using FileReader.readAsDataURL()
  2. The data URL (format: data:application/pdf;base64,<content>) is sent in the request body
  3. API route converts it to the format expected by OpenRouter
// Source: app/api/ocr-structured-v4/route.ts:31-34
function toPdfDataUrl(pdfBase64: string) {
  if (pdfBase64.startsWith("data:")) return pdfBase64;
  return `data:application/pdf;base64,${pdfBase64}`;
}
The system automatically detects data URL format and avoids double-wrapping.

Method 2: Public URLs

For server-to-server integrations:
{
  "pdfUrl": "https://example.com/invoices/invoice-123.pdf",
  "model": "google/gemini-2.5-flash"
}
Use cases:
  • Invoices stored in cloud storage (S3, GCS)
  • Webhook integrations
  • Batch processing from external systems
When using pdfUrl, ensure the URL is publicly accessible or includes authentication tokens in the URL itself. The API cannot pass custom headers to the PDF fetch.

PDF Parsing Engines

Invoice OCR supports three PDF parsing engines configured via the OPENROUTER_PDF_ENGINE environment variable:

pdf-text (Default)

OPENROUTER_PDF_ENGINE=pdf-text
Characteristics:
  • Extracts text layers from native PDFs
  • Fast and accurate for digitally created invoices
  • Preserves structure and layout hints
  • Best for: ERP-generated invoices, e-invoices, programmatically created PDFs
Example configuration:
// Source: app/api/ocr-structured-v4/route.ts:269-275
if (isPdf) {
  const engine = process.env.OPENROUTER_PDF_ENGINE || "pdf-text";
  const plugins: unknown = body.plugins || [
    {
      id: "file-parser",
      pdf: { engine },
    },
  ];
  (payload as Record<string, unknown>).plugins = plugins as unknown;
}

mistral-ocr

OPENROUTER_PDF_ENGINE=mistral-ocr
Characteristics:
  • Uses Mistral AI’s OCR capabilities
  • Handles scanned documents and images embedded in PDFs
  • Better for handwritten or low-quality scans
  • Best for: Scanned invoices, photos of paper documents
If your PDFs are scanned images or photos rather than native digital documents, switch to mistral-ocr for better accuracy.

native

OPENROUTER_PDF_ENGINE=native
Characteristics:
  • Uses the model’s built-in PDF understanding
  • No preprocessing or text extraction
  • Relies entirely on model’s multimodal capabilities
  • Best for: Testing, comparing approaches, or when other engines fail

Plugin Configuration

You can override the PDF engine on a per-request basis:
// Custom plugin configuration
const response = await fetch("/api/ocr-structured-v4", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    pdfBase64: pdfDataUrl,
    filename: "invoice.pdf",
    model: "google/gemini-2.5-flash",
    plugins: [
      {
        id: "file-parser",
        pdf: { engine: "mistral-ocr" }
      }
    ]
  })
});
Custom plugins in the request body take precedence over the OPENROUTER_PDF_ENGINE environment variable.

Multi-Page Processing

The system processes all pages in a PDF automatically:
// System prompt excerpt from: app/api/ocr-structured-v4/route.ts:145
"If a PDF is provided, consider ALL pages. Prefer HSN tables and tax summaries found on any page as anchors."
Key behaviors:
1

Full document analysis

The AI analyzes every page of the PDF, not just the first page.
2

Cross-page reconciliation

Data from summary pages (totals, HSN tables) is used to verify line items from earlier pages.
3

Duplicate detection

The system treats duplicate copies (Original/Duplicate/Transporter) as a single invoice.
// From system prompt
"Treat duplicate copies (Original/Duplicate/Transporter) as one invoice."

OpenRouter Annotations

To avoid re-parsing PDFs on subsequent requests, use annotations:
// First request - parse the PDF
const firstResponse = await fetch("/api/ocr-structured-v4", {
  method: "POST",
  body: JSON.stringify({
    pdfBase64: pdfDataUrl,
    filename: "invoice.pdf"
  })
});

const firstData = await firstResponse.json();
const annotations = firstData._annotations; // OpenRouter metadata

// Second request - reuse parsed content
const secondResponse = await fetch("/api/ocr-structured-v4", {
  method: "POST",
  body: JSON.stringify({
    pdfBase64: pdfDataUrl,
    filename: "invoice.pdf",
    annotations: annotations // Skip re-parsing
  })
});
How it works:
// Source: app/api/ocr-structured-v4/route.ts:256-265
if (body.annotations) {
  const msgs = payload.messages as Array<Record<string, unknown>>;
  msgs.push({
    role: "assistant",
    content: "Previous file parse metadata",
    annotations: body.annotations as unknown,
  });
}
Using annotations can significantly reduce processing costs for PDFs, especially large multi-page documents. The AI reuses the parsed text without re-processing the binary file.

Optimization Best Practices

Choose the Right Engine

For invoices generated by accounting software, ERPs, or e-invoicing systems:
OPENROUTER_PDF_ENGINE=pdf-text
  • Fastest processing
  • Highest accuracy for text extraction
  • Preserves layout structure

File Size Optimization

Large PDFs can slow down processing and increase costs:
1

Compress PDFs

Use tools like Adobe Acrobat, Ghostscript, or online compressors:
# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=compressed.pdf input.pdf
2

Reduce image resolution

If the PDF contains embedded images, reduce their DPI:
  • 150 DPI is sufficient for text extraction
  • 300 DPI for high-quality scans
3

Remove unnecessary pages

Strip out blank pages, covers, or non-invoice content before upload.

Model Selection

Different models have different PDF processing capabilities:
// Source: components/ocr-uploader.tsx:430-435
<option value="google/gemini-2.5-flash">Gemini 2.5 Flash (Recommended)</option>
<option value="openai/gpt-4o-mini">GPT-4o Mini</option>
<option value="openai/o3-mini">OpenAI o3-mini</option>
<option value="google/gemini-2.5-pro">Gemini 2.5 Pro</option>
<option value="openai/gpt-5-reasoning">GPT-5 Reasoning</option>
Recommendations:
  • Gemini 2.5 Flash: Best balance of speed, cost, and accuracy for PDFs
  • Gemini 2.5 Pro: Highest accuracy for complex multi-page invoices
  • GPT-4o Mini: Good alternative for structured extraction

Debugging PDF Processing

If you encounter issues with PDF extraction:

Use Raw Text Mode

Switch to “Raw Text” extraction mode to see what the model reads:
// Source: components/ocr-uploader.tsx:155
const endpoint = mode === "structured" ? 
  (extractor === "v4" ? "/api/ocr-structured-v4" : "/api/ocr-structured") : 
  "/api/ocr";
1

Set mode to 'Raw Text'

In the UI, select Raw Text from the Extraction Mode dropdown.
2

Upload and extract

Upload your PDF and click Extract Data.
3

Review raw text

Examine the extracted text to verify the PDF parser is reading the content correctly.

Check API Response

Inspect the full API response for debugging information:
// Source: app/api/ocr-structured-v4/route.ts:299-306
const json = await response.json();
const content: unknown = json?.choices?.[0]?.message?.content;
if (!content) {
  return NextResponse.json(
    { error: "No content returned from model" },
    { status: 500 }
  );
}
If the API returns “No content returned from model”, the PDF may be corrupted, password-protected, or in an unsupported format.

Common PDF Issues

Password-Protected PDFs

Problem: The PDF requires a password to open. Solution: Remove the password before uploading:
# Using qpdf
qpdf --decrypt --password=yourpassword input.pdf output.pdf

Scanned Images as PDFs

Problem: Low accuracy for PDFs that are just scanned images. Solution: Switch to mistral-ocr engine:
OPENROUTER_PDF_ENGINE=mistral-ocr

Multi-Language PDFs

Problem: Invoices with mixed languages (e.g., English and Hindi). Solution: Gemini models handle multi-language content well. Ensure your system prompt doesn’t restrict language:
// The v4 API automatically supports multi-language
// meta.language field captures detected language

Next Steps

Understanding Reconciliation

Learn how the system validates and reconciles invoice totals

Review Tool

Debug extraction issues using the review tool

Build docs developers (and LLMs) love