Skip to main content

Overview

The Invoice OCR system processes documents through a multi-stage pipeline: Upload → API Selection → OpenRouter Processing → Reconciliation → Display. This architecture supports both images and PDFs with multiple extraction modes.

Flow Diagram

┌──────────────┐
│   User       │
│   Upload     │
└──────┬───────┘


┌─────────────────────────────────────────┐
│  ocr-uploader.tsx (Frontend)            │
│  - File validation (image/PDF)          │
│  - Base64 encoding                      │
│  - Mode selection (raw/structured)      │
│  - Extractor selection (v4/compact)     │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Route Selection                         │
│  ┌─────────────────────────────────┐   │
│  │ Raw Mode → /api/ocr             │   │
│  │ Structured + v4 → /api/ocr-     │   │
│  │                   structured-v4  │   │
│  │ Structured + compact → /api/ocr-│   │
│  │                   structured     │   │
│  └─────────────────────────────────┘   │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  API Route Handler                       │
│  - Validate input                        │
│  - Build OpenRouter payload              │
│  - Add system prompt + schema            │
│  - Configure plugins (PDF)               │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  OpenRouter API Call                     │
│  https://openrouter.ai/api/v1/          │
│         chat/completions                 │
│                                          │
│  Headers:                                │
│  - Authorization: Bearer <API_KEY>       │
│  - HTTP-Referer: <SITE_URL>             │
│  - X-Title: <APP_NAME>                  │
│                                          │
│  Body:                                   │
│  - model (gemini-2.5-flash, gpt-4o...)  │
│  - messages (system + user + file)       │
│  - response_format: {type: json_object}  │
│  - plugins (for PDF parsing)             │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Response Processing                     │
│  - Coerce to valid JSON                  │
│  - Strip markdown code fences            │
│  - Handle union types                    │
│  - Replace NaN/Infinity with null        │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Reconciliation (if structured)          │
│  - reconcileV4() for v4 schema           │
│  - reconcile() for compact schema        │
│  - Try multiple hypotheses               │
│  - Pick best match                       │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Return to Frontend                      │
│  - Structured JSON with reconciliation   │
│  - Or raw text (for raw mode)            │
└──────┬──────────────────────────────────┘


┌─────────────────────────────────────────┐
│  Display Components                      │
│  - invoice-viewer-v4.tsx (v4 schema)    │
│  - invoice-viewer.tsx (compact schema)   │
│  - Confetti on success                   │
│  - Show reconciliation status            │
└─────────────────────────────────────────┘

Stage 1: Upload & Validation

File Input

Location: components/ocr-uploader.tsx:106-143 The uploader accepts files via:
  • File input: Click to browse
  • Drag & drop: Drop files directly onto the upload area
const handleFile = (f: File | null) => {
  // ...
  const pdf = f.type === "application/pdf" || f.name.toLowerCase().endsWith(".pdf");
  setIsPdf(pdf);
  const reader = new FileReader();
  reader.onload = () => setPreview(reader.result as string);
  reader.readAsDataURL(f); // Convert to base64 data URL
};
Supported formats:
  • Images: image/* (PNG, JPG, JPEG, WebP)
  • Documents: application/pdf

Mode Selection

Location: components/ocr-uploader.tsx:437-461 Users choose:
  1. Extraction Mode: raw (plain text) or structured (JSON)
  2. Schema Format (if structured): v4 (India GST) or compact (legacy)
  3. AI Model: Gemini 2.5 Flash (default), GPT-4o Mini, o3-mini, etc.

Stage 2: API Route Processing

Route: /api/ocr (Raw Text)

Location: app/api/ocr/route.ts:31-131 Purpose: Extract plain text from image/PDF Request payload:
{
  imageBase64?: string,  // Data URL or base64
  pdfUrl?: string,       // Public URL
  pdfBase64?: string,    // Base64 PDF
  filename?: string,     // PDF filename
  model?: string         // Model override
}
Key logic (app/api/ocr/route.ts:61-103):
const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${apiKey}`,
    "HTTP-Referer": site,
    "X-Title": title,
  },
  body: JSON.stringify({
    model,
    temperature: 0,
    messages: [
      {
        role: "system",
        content: "You are an OCR extractor. Return only the raw, verbatim text...",
      },
      {
        role: "user",
        content: [
          { type: "text", text: "Extract all text..." },
          // Image or PDF file attachment
        ],
      },
    ],
  }),
});

Route: /api/ocr-structured-v4 (India GST Schema)

Location: app/api/ocr-structured-v4/route.ts:196-406 Purpose: Extract structured invoice data using v4 schema with reconciliation Request payload (same as /api/ocr plus):
{
  // ... standard fields
  annotations?: unknown,  // Pass-through for cached parsing
  plugins?: unknown       // Override PDF plugins
}
System prompt (app/api/ocr-structured-v4/route.ts:136-194):
  • 2,600+ character prompt defining extraction rules
  • Includes complete JSON schema (150 lines)
  • Specifies decision rules for price mode, discounts, GST split
  • Enforces 2-decimal precision and normalization
Response processing (app/api/ocr-structured-v4/route.ts:308-351):
const coerceToJson = (raw: unknown): unknown => {
  // Strip markdown code fences
  if (s.startsWith("```")) {
    s = s.replace(/^```[a-zA-Z]*\n/, "").replace(/```\s*$/, "").trim();
  }
  
  // Extract first JSON object
  const firstBrace = s.indexOf("{");
  const lastBrace = s.lastIndexOf("}");
  if (firstBrace !== -1 && lastBrace > firstBrace) {
    s = s.slice(firstBrace, lastBrace + 1);
  }
  
  // Clean up union types, trailing commas, NaN/Infinity
  s = s.replace(/"([^"]+)"\s*:\s*"([^"]+)"\s*\|\s*"([^"]+)"/g, '"$1": "$2"');
  s = s.replace(/,\s*([}\]])/g, "$1");
  s = s.replace(/\bNaN\b|\bInfinity\b|\b-?Infinity\b/g, "null");
  
  return JSON.parse(s);
};
Reconciliation (app/api/ocr-structured-v4/route.ts:391-398):
try {
  const doc = parsed as V4Doc;
  const out = reconcileV4(doc);  // Apply reconciliation engine
  return NextResponse.json(out);
} catch {
  return NextResponse.json(parsed); // Return raw if reconciliation fails
}

Route: /api/ocr-structured (Compact Schema)

Location: app/api/ocr-structured/route.ts:156-295 Purpose: Legacy schema with voucher, items, party structure Schema (app/api/ocr-structured/route.ts:21-120):
{
  voucher: {
    invoice_number: string,
    invoice_date: string,
    invoice_discount: string,
    invoice_discount_mode: "before_tax" | "after_tax" | "",
    round_off: string,
    total_invoice_amount: string,
    additional_charges: [{ name, amount, tax_rate, amount_includes_tax }],
    reconciliation: { status: "matched" | "unmatched" }
  },
  items: [{ price, unit, name, hsn_sac_code, quantity, tax_rate, discount_rate }],
  party: { party_gstin_number, party_name, party_address, ... }
}

Stage 3: PDF Handling

Plugin Configuration

Location: app/api/ocr-structured-v4/route.ts:268-277
if (isPdf) {
  const engine = process.env.OPENROUTER_PDF_ENGINE || "pdf-text";
  const plugins: unknown = body.plugins || [
    {
      id: "file-parser",
      pdf: { engine },
    },
  ];
  (payload as Record<string, unknown>).plugins = plugins as unknown;
}
Available engines (configured via .env.local):
  • pdf-text (default): Text extraction only
  • mistral-ocr: OCR for scanned PDFs
  • native: Use model’s native PDF support

Message Format for PDFs

Location: app/api/ocr-structured-v4/route.ts:238-251
messages: [
  { role: "system", content: SYSTEM_PROMPT },
  {
    role: "user",
    content: [
      { type: "text", text: "Return ONLY JSON matching the provided schema." },
      {
        type: "file",
        file: {
          filename: body.filename || "invoice.pdf",
          file_data: pdfData,  // Data URL or public URL
        },
      },
    ],
  },
]

Annotations (Caching)

Location: app/api/ocr-structured-v4/route.ts:256-265 To avoid re-parsing costs for the same PDF:
if (body.annotations) {
  const msgs = payload.messages as Array<Record<string, unknown>>;
  msgs.push({
    role: "assistant",
    content: "Previous file parse metadata",
    annotations: body.annotations as unknown,
  });
}

Stage 4: Frontend Display

Invoice Viewer V4

Location: components/invoice-viewer-v4.tsx:12-225 Key features:
  • Reconciliation status badge (green = matched, red = error > 0.05)
  • Document header (supplier, invoice number, date)
  • Items table with computed columns
  • Header discounts and charges breakdown
  • Totals summary with printed vs computed comparison
  • Alternates trace for debugging
Reconciliation check (components/invoice-viewer-v4.tsx:13-14):
const doc = React.useMemo(() => reconcileV4(data), [data]);
const matched = (doc.reconciliation?.error_absolute ?? 0) <= 0.05 
               && (doc.printed?.grand_total ?? 0) > 0;

Success Feedback

Location: components/ocr-uploader.tsx:189-194
// Celebration effect on success
setShowConfetti(true);
setJustCompleted(true);
setTimeout(() => setShowConfetti(false), 100);
setTimeout(() => setJustCompleted(false), 3000);
Confetti animation + shimmer effect provide immediate visual feedback.

Error Handling

API Errors

OpenRouter failure (app/api/ocr-structured-v4/route.ts:291-296):
if (!response.ok) {
  const err = await response.text();
  return NextResponse.json(
    { error: `OpenRouter error: ${response.status} ${err}` },
    { status: 500 }
  );
}
JSON parsing failure (app/api/ocr-structured-v4/route.ts:385-388):
return NextResponse.json(
  { error: `Model did not return valid JSON (${message})`, error_excerpt: excerpt },
  { status: 500 }
);

Frontend Error Display

Location: components/ocr-uploader.tsx:515-523
{error && (
  <div className="flex items-start gap-2 p-4 bg-red-50 ... rounded-lg" role="alert">
    <svg>...</svg>
    <div className="text-sm text-red-800">{error}</div>
  </div>
)}

Performance Optimization

Loading States

Ticking timer (components/ocr-uploader.tsx:43-57):
React.useEffect(() => {
  let id: number | null = null;
  if (loading) {
    if (startRef.current == null) startRef.current = Date.now();
    id = window.setInterval(() => {
      if (startRef.current != null) setDurationMs(Date.now() - startRef.current);
    }, 100);
  }
  // Updates every 100ms to show live progress
}, [loading]);

Memoization

Location: components/invoice-viewer-v4.tsx:13
const doc = React.useMemo(() => reconcileV4(data), [data]);
Reconciliation only runs when data changes, not on every render.

Environment Variables

Required:
  • OPENROUTER_API_KEY: Authentication for OpenRouter API
Optional:
  • OPENROUTER_MODEL: Default model (fallback: google/gemini-2.0-flash)
  • OPENROUTER_SITE_URL: Referer for OpenRouter (default: http://localhost:3000)
  • OPENROUTER_APP_NAME: App title (default: Invoice OCR)
  • OPENROUTER_PDF_ENGINE: PDF parsing engine (pdf-text | mistral-ocr | native)

Next Steps

Build docs developers (and LLMs) love