Overview
The Invoice OCR system processes documents through a multi-stage pipeline: Upload → API Selection → OpenRouter Processing → Reconciliation → Display. This architecture supports both images and PDFs with multiple extraction modes.Flow Diagram
Stage 1: Upload & Validation
File Input
Location:components/ocr-uploader.tsx:106-143
The uploader accepts files via:
- File input: Click to browse
- Drag & drop: Drop files directly onto the upload area
- Images:
image/*(PNG, JPG, JPEG, WebP) - Documents:
application/pdf
Mode Selection
Location:components/ocr-uploader.tsx:437-461
Users choose:
- Extraction Mode:
raw(plain text) orstructured(JSON) - Schema Format (if structured):
v4(India GST) orcompact(legacy) - AI Model: Gemini 2.5 Flash (default), GPT-4o Mini, o3-mini, etc.
Stage 2: API Route Processing
Route: /api/ocr (Raw Text)
Location: app/api/ocr/route.ts:31-131
Purpose: Extract plain text from image/PDF
Request payload:
app/api/ocr/route.ts:61-103):
Route: /api/ocr-structured-v4 (India GST Schema)
Location: app/api/ocr-structured-v4/route.ts:196-406
Purpose: Extract structured invoice data using v4 schema with reconciliation
Request payload (same as /api/ocr plus):
app/api/ocr-structured-v4/route.ts:136-194):
- 2,600+ character prompt defining extraction rules
- Includes complete JSON schema (150 lines)
- Specifies decision rules for price mode, discounts, GST split
- Enforces 2-decimal precision and normalization
app/api/ocr-structured-v4/route.ts:308-351):
app/api/ocr-structured-v4/route.ts:391-398):
Route: /api/ocr-structured (Compact Schema)
Location: app/api/ocr-structured/route.ts:156-295
Purpose: Legacy schema with voucher, items, party structure
Schema (app/api/ocr-structured/route.ts:21-120):
Stage 3: PDF Handling
Plugin Configuration
Location:app/api/ocr-structured-v4/route.ts:268-277
.env.local):
pdf-text(default): Text extraction onlymistral-ocr: OCR for scanned PDFsnative: Use model’s native PDF support
Message Format for PDFs
Location:app/api/ocr-structured-v4/route.ts:238-251
Annotations (Caching)
Location:app/api/ocr-structured-v4/route.ts:256-265
To avoid re-parsing costs for the same PDF:
Stage 4: Frontend Display
Invoice Viewer V4
Location:components/invoice-viewer-v4.tsx:12-225
Key features:
- Reconciliation status badge (green = matched, red = error > 0.05)
- Document header (supplier, invoice number, date)
- Items table with computed columns
- Header discounts and charges breakdown
- Totals summary with printed vs computed comparison
- Alternates trace for debugging
components/invoice-viewer-v4.tsx:13-14):
Success Feedback
Location:components/ocr-uploader.tsx:189-194
Error Handling
API Errors
OpenRouter failure (app/api/ocr-structured-v4/route.ts:291-296):
app/api/ocr-structured-v4/route.ts:385-388):
Frontend Error Display
Location:components/ocr-uploader.tsx:515-523
Performance Optimization
Loading States
Ticking timer (components/ocr-uploader.tsx:43-57):
Memoization
Location:components/invoice-viewer-v4.tsx:13
data changes, not on every render.
Environment Variables
Required:OPENROUTER_API_KEY: Authentication for OpenRouter API
OPENROUTER_MODEL: Default model (fallback:google/gemini-2.0-flash)OPENROUTER_SITE_URL: Referer for OpenRouter (default:http://localhost:3000)OPENROUTER_APP_NAME: App title (default:Invoice OCR)OPENROUTER_PDF_ENGINE: PDF parsing engine (pdf-text|mistral-ocr|native)
Next Steps
- Reconciliation Logic - Deep dive into v4 engine
- OpenRouter Integration - API details and headers
- Testing Guide - How to test the OCR flow
