Processing PDFs

Overview

Invoice OCR provides robust PDF processing capabilities with multiple parsing engines and optimization options. PDFs are handled differently from images to extract text and structure efficiently across multiple pages.

PDF vs Image Processing

Understanding the difference between PDF and image processing:

Images

Direct visual analysis by AI
OCR reads text from pixels
Single page only
Faster processing

PDFs

Text extraction + visual analysis
Multi-page support
Plugin-based parsing
Comprehensive data extraction

PDF Input Methods

The API accepts PDFs in two formats:

Method 1: Base64 Data URLs

The most common method used by the UI:

// Source: components/ocr-uploader.tsx:158-159
const body = isPdf
  ? { pdfBase64: preview, filename: file.name || "document.pdf", model }
  : { imageBase64: preview, mimeType: file.type, model };

How it works:

File is read as a data URL using FileReader.readAsDataURL()
The data URL (format: data:application/pdf;base64,<content>) is sent in the request body
API route converts it to the format expected by OpenRouter

// Source: app/api/ocr-structured-v4/route.ts:31-34
function toPdfDataUrl(pdfBase64: string) {
  if (pdfBase64.startsWith("data:")) return pdfBase64;
  return `data:application/pdf;base64,${pdfBase64}`;
}

The system automatically detects data URL format and avoids double-wrapping.

Method 2: Public URLs

For server-to-server integrations:

{
  "pdfUrl": "https://example.com/invoices/invoice-123.pdf",
  "model": "google/gemini-2.5-flash"
}

Use cases:

Invoices stored in cloud storage (S3, GCS)
Webhook integrations
Batch processing from external systems

When using pdfUrl, ensure the URL is publicly accessible or includes authentication tokens in the URL itself. The API cannot pass custom headers to the PDF fetch.

PDF Parsing Engines

Invoice OCR supports three PDF parsing engines configured via the OPENROUTER_PDF_ENGINE environment variable:

pdf-text (Default)

OPENROUTER_PDF_ENGINE=pdf-text

Characteristics:

Extracts text layers from native PDFs
Fast and accurate for digitally created invoices
Preserves structure and layout hints
Best for: ERP-generated invoices, e-invoices, programmatically created PDFs

Example configuration:

// Source: app/api/ocr-structured-v4/route.ts:269-275
if (isPdf) {
  const engine = process.env.OPENROUTER_PDF_ENGINE || "pdf-text";
  const plugins: unknown = body.plugins || [
    {
      id: "file-parser",
      pdf: { engine },
    },
  ];
  (payload as Record<string, unknown>).plugins = plugins as unknown;
}

mistral-ocr

OPENROUTER_PDF_ENGINE=mistral-ocr

Characteristics:

Uses Mistral AI’s OCR capabilities
Handles scanned documents and images embedded in PDFs
Better for handwritten or low-quality scans
Best for: Scanned invoices, photos of paper documents

If your PDFs are scanned images or photos rather than native digital documents, switch to mistral-ocr for better accuracy.

native

OPENROUTER_PDF_ENGINE=native

Characteristics:

Uses the model’s built-in PDF understanding
No preprocessing or text extraction
Relies entirely on model’s multimodal capabilities
Best for: Testing, comparing approaches, or when other engines fail

Plugin Configuration

You can override the PDF engine on a per-request basis:

// Custom plugin configuration
const response = await fetch("/api/ocr-structured-v4", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    pdfBase64: pdfDataUrl,
    filename: "invoice.pdf",
    model: "google/gemini-2.5-flash",
    plugins: [
      {
        id: "file-parser",
        pdf: { engine: "mistral-ocr" }
      }
    ]
  })
});

Custom plugins in the request body take precedence over the OPENROUTER_PDF_ENGINE environment variable.

Multi-Page Processing

The system processes all pages in a PDF automatically:

// System prompt excerpt from: app/api/ocr-structured-v4/route.ts:145
"If a PDF is provided, consider ALL pages. Prefer HSN tables and tax summaries found on any page as anchors."

Key behaviors:

Full document analysis

The AI analyzes every page of the PDF, not just the first page.

Cross-page reconciliation

Data from summary pages (totals, HSN tables) is used to verify line items from earlier pages.

Duplicate detection

The system treats duplicate copies (Original/Duplicate/Transporter) as a single invoice.

// From system prompt
"Treat duplicate copies (Original/Duplicate/Transporter) as one invoice."

OpenRouter Annotations

To avoid re-parsing PDFs on subsequent requests, use annotations:

// First request - parse the PDF
const firstResponse = await fetch("/api/ocr-structured-v4", {
  method: "POST",
  body: JSON.stringify({
    pdfBase64: pdfDataUrl,
    filename: "invoice.pdf"
  })
});

const firstData = await firstResponse.json();
const annotations = firstData._annotations; // OpenRouter metadata

// Second request - reuse parsed content
const secondResponse = await fetch("/api/ocr-structured-v4", {
  method: "POST",
  body: JSON.stringify({
    pdfBase64: pdfDataUrl,
    filename: "invoice.pdf",
    annotations: annotations // Skip re-parsing
  })
});

How it works:

// Source: app/api/ocr-structured-v4/route.ts:256-265
if (body.annotations) {
  const msgs = payload.messages as Array<Record<string, unknown>>;
  msgs.push({
    role: "assistant",
    content: "Previous file parse metadata",
    annotations: body.annotations as unknown,
  });
}

Using annotations can significantly reduce processing costs for PDFs, especially large multi-page documents. The AI reuses the parsed text without re-processing the binary file.

Optimization Best Practices

Choose the Right Engine

Digital PDFs
Scanned PDFs
Mixed or Unknown

For invoices generated by accounting software, ERPs, or e-invoicing systems:

OPENROUTER_PDF_ENGINE=pdf-text

Fastest processing
Highest accuracy for text extraction
Preserves layout structure

For scanned paper invoices or photo-based PDFs:

OPENROUTER_PDF_ENGINE=mistral-ocr

Better OCR for unclear text
Handles skewed or rotated documents
Supports handwritten annotations

When you’re unsure or have a mix of both:

OPENROUTER_PDF_ENGINE=native

Let the model decide the best approach
Works as a fallback option
Useful for testing and comparison

File Size Optimization

Large PDFs can slow down processing and increase costs:

Compress PDFs

Use tools like Adobe Acrobat, Ghostscript, or online compressors:

# Using Ghostscript
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=compressed.pdf input.pdf

Reduce image resolution

If the PDF contains embedded images, reduce their DPI:

150 DPI is sufficient for text extraction
300 DPI for high-quality scans

Remove unnecessary pages

Strip out blank pages, covers, or non-invoice content before upload.

Model Selection

Different models have different PDF processing capabilities:

// Source: components/ocr-uploader.tsx:430-435
<option value="google/gemini-2.5-flash">Gemini 2.5 Flash (Recommended)</option>
<option value="openai/gpt-4o-mini">GPT-4o Mini</option>
<option value="openai/o3-mini">OpenAI o3-mini</option>
<option value="google/gemini-2.5-pro">Gemini 2.5 Pro</option>
<option value="openai/gpt-5-reasoning">GPT-5 Reasoning</option>

Recommendations:

Gemini 2.5 Flash: Best balance of speed, cost, and accuracy for PDFs
Gemini 2.5 Pro: Highest accuracy for complex multi-page invoices
GPT-4o Mini: Good alternative for structured extraction

Debugging PDF Processing

If you encounter issues with PDF extraction:

Use Raw Text Mode

Switch to “Raw Text” extraction mode to see what the model reads:

// Source: components/ocr-uploader.tsx:155
const endpoint = mode === "structured" ? 
  (extractor === "v4" ? "/api/ocr-structured-v4" : "/api/ocr-structured") : 
  "/api/ocr";

Set mode to 'Raw Text'

In the UI, select Raw Text from the Extraction Mode dropdown.

Upload and extract

Upload your PDF and click Extract Data.

Review raw text

Examine the extracted text to verify the PDF parser is reading the content correctly.

Check API Response

Inspect the full API response for debugging information:

// Source: app/api/ocr-structured-v4/route.ts:299-306
const json = await response.json();
const content: unknown = json?.choices?.[0]?.message?.content;
if (!content) {
  return NextResponse.json(
    { error: "No content returned from model" },
    { status: 500 }
  );
}

If the API returns “No content returned from model”, the PDF may be corrupted, password-protected, or in an unsupported format.

Common PDF Issues

Password-Protected PDFs

Problem: The PDF requires a password to open. Solution: Remove the password before uploading:

# Using qpdf
qpdf --decrypt --password=yourpassword input.pdf output.pdf

Scanned Images as PDFs

Problem: Low accuracy for PDFs that are just scanned images. Solution: Switch to mistral-ocr engine:

OPENROUTER_PDF_ENGINE=mistral-ocr

Multi-Language PDFs

Problem: Invoices with mixed languages (e.g., English and Hindi). Solution: Gemini models handle multi-language content well. Ensure your system prompt doesn’t restrict language:

// The v4 API automatically supports multi-language
// meta.language field captures detected language

Get Started

Core Features

Guides

Configuration

Overview

PDF vs Image Processing

Images

PDFs

PDF Input Methods

Method 1: Base64 Data URLs

Method 2: Public URLs

PDF Parsing Engines

pdf-text (Default)

mistral-ocr

native

Plugin Configuration

Multi-Page Processing

OpenRouter Annotations

Optimization Best Practices

Choose the Right Engine

File Size Optimization

Model Selection

Debugging PDF Processing

Use Raw Text Mode

Check API Response

Common PDF Issues

Password-Protected PDFs

Scanned Images as PDFs

Multi-Language PDFs

Next Steps

Understanding Reconciliation

Review Tool

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Configuration

​Overview

​PDF vs Image Processing

Images

PDFs

​PDF Input Methods

​Method 1: Base64 Data URLs

​Method 2: Public URLs

​PDF Parsing Engines

​pdf-text (Default)

​mistral-ocr

​native

​Plugin Configuration

​Multi-Page Processing

​OpenRouter Annotations

​Optimization Best Practices

​Choose the Right Engine

​File Size Optimization

​Model Selection

​Debugging PDF Processing

​Use Raw Text Mode

​Check API Response

​Common PDF Issues

​Password-Protected PDFs

​Scanned Images as PDFs

​Multi-Language PDFs

​Next Steps

Understanding Reconciliation

Review Tool

Build docs developers (and LLMs) love

Overview

PDF vs Image Processing

PDF Input Methods

Method 1: Base64 Data URLs

Method 2: Public URLs

PDF Parsing Engines

pdf-text (Default)

mistral-ocr

native

Plugin Configuration

Multi-Page Processing

OpenRouter Annotations

Optimization Best Practices

Choose the Right Engine

File Size Optimization

Model Selection

Debugging PDF Processing

Use Raw Text Mode

Check API Response

Common PDF Issues

Password-Protected PDFs

Scanned Images as PDFs

Multi-Language PDFs

Next Steps