Overview
The V4 reconciliation engine (lib/invoice_v4.ts) transforms raw OCR output into mathematically self-consistent invoice data. It respects printed anchors (HSN tax tables, subtotals, grand total) and tries multiple interpretations to find the best match.
Goal: Take the model’s structured JSON and make the numbers add up like a real invoice, with minimal error vs. the printed grand total.
Core Principles
Anchor-First
Printed values (HSN tax table, taxable subtotal, grand total) are treated as ground truth
Multi-Pass
Tests multiple price mode hypotheses (WITH_TAX, WITHOUT_TAX) and picks the best
Smart Allocation
Distributes header discounts across GST buckets to match printed GST amounts
Tolerance-Based
Accepts small round-off errors (≤ ₹1.02) as valid matches
Reconciliation Phases
ThereconcileV4() function runs in six phases:
Phase 1: Item Line Recomputation
Purpose: Derive line-level ex-tax amounts from rate, quantity, and discounts.Choose Best Source of Truth
For each item, the engine evaluates three candidates:
- Computed:
qty × rate_ex_tax × (1 - d1%) × (1 - d2%) - flat_discount - Printed: Line amount from OCR, adjusted for price mode (WITH_TAX → divide by (1 + GST%), WITHOUT_TAX → use as-is)
- Model Hint: Explicit
amount_ex_tax_after_discountif provided
- If printed amount ≈ pre-discount gross and discount exists → use computed discounted value
- Otherwise, prefer printed (but never exceed computed discounted value)
- Fallback to model hint or computed value
Normalize GST Rate
Snaps raw GST percentages to standard slabs:Examples:
17.8→18(within 0.75 tolerance)0.18→18(auto-scale fractions)5.1→5
Phase 2: Header Discounts
Purpose: Apply sequential voucher-level discounts (Trade Discount, Special Discount, etc.) before GST.- Percent Discounts
- Absolute Discounts
- Smart Allocation (with GST target)
Applied sequentially with multiplicative effect:Order matters!
[10%, 5%] ≠ [5%, 10%].Phase 3: Printed Anchors
Purpose: Scale items to match printed taxable values when available.Scenario A: HSN Tax Table Exists
Scenario A: HSN Tax Table Exists
Best case: Invoice includes a detailed HSN-wise tax breakdown.The engine:
- Groups table rows by total GST rate (e.g., 9% + 9% = 18%)
- Scales item lines within each bucket to match printed taxable value exactly:
- Recomputes discounts, GST, totals to maintain consistency
Scenario B: Printed Taxable Subtotal Only
Scenario B: Printed Taxable Subtotal Only
Fallback: Invoice prints “Total Taxable Value” without per-HSN breakdown.The engine:
- Estimates whether
taxable_subtotalincludes taxable charges:- If charges exist →
targetItemsOnly = taxable_subtotal - charges_ex_tax - Else →
targetItemsOnly = taxable_subtotal
- If charges exist →
- Computes reduction needed:
cut = current_items_ex_tax - targetItemsOnly - Uses smart allocation (greedy bucket selection) guided by printed GST total
Scenario C: No Anchors
Scenario C: No Anchors
Rare: No printed taxable subtotal or HSN table.The engine skips anchor adjustments and relies on:
- Line-level printed amounts (if price mode is WITHOUT_TAX)
- Computed values from rate × qty × discounts
- Final grand total matching via round-off
Phase 4: Charges
Purpose: Add freight, packing, insurance, etc. with inferred GST when needed.If
gst_rate_hint is missing, the engine uses the weighted average GST rate from items:Phase 5: Totals & TCS
Purpose: Compute final taxable, GST, and grand totals.Decide Non-Taxable Charges Inclusion
Some invoices include non-taxable charges (e.g., late fees) in the grand total, others don’t.The engine tests both options and picks the one with lower error:
Phase 6: Multi-Hypothesis Testing
Purpose: Try multiple interpretations and pick the best match. ThereconcileV4() function tests four scenarios:
Candidate 1: as_is
Candidate 1: as_is
Uses
rate_ex_tax and price mode hints from the model as-is.Best for: Invoices where the model correctly identified ex-tax rates.Candidate 2: as_is_items_only_when_no_hsn
Candidate 2: as_is_items_only_when_no_hsn
When no HSN table exists, assumes printed taxable subtotal = items only (excludes taxable charges).Best for: Invoices that print “Taxable Value” before “Add: Freight” line.
Candidate 3: from_printed_with_tax
Candidate 3: from_printed_with_tax
Reinterprets printed per-unit rate as including GST:Best for: Retail invoices where MRP/selling price includes GST.
Candidate 4: from_printed_without_tax
Candidate 4: from_printed_without_tax
Treats printed rate as ex-tax directly:Best for: B2B invoices with clear “Rate” and “GST” columns.
- High absolute error vs. printed grand total
- Large round-off values (> ₹1.00)
Tolerance Levels
| Tolerance Type | Value | Purpose |
|---|---|---|
| Line Matching | ₹0.05 | Float precision tolerance when comparing printed vs. computed line amounts |
| GST Slab Snapping | 0.75% | Snap to nearest standard slab if within tolerance (e.g., 17.8% → 18%) |
| HSN Table Scaling | ₹0.75 | Scale printed bucket totals to match taxable subtotal if difference > threshold |
| Acceptable Round-Off | ≤ ₹1.02 | Maximum round-off automatically adopted during reconciliation |
| Matched Status | ≤ ₹0.05 | Error threshold for reconciliation.status = "matched" |
Validation Logic
The reconciliation engine exposes detailed metadata for validation:1. Error Absolute
2. Alternates Considered
Use this to debug why a specific interpretation was chosen:3. Warnings
Check for special cases:Real-World Examples
- Perfect Match
- HSN Scaling
- With Minor Error
- Requires Review
Integration Guide
Step 1: Call the API
Step 2: Validate Reconciliation
Step 3: Handle Edge Cases
Debugging Tips
Large Error (> ₹10)
Large Error (> ₹10)
Common causes:
- Missing items on subsequent pages (multi-page PDF)
- Charges not extracted (Freight, Packing)
- TCS not detected
- Check
meta.pages_processed— should match PDF page count - Compare
totals.items_ex_taxwith printed subtotal - Look for charge keywords in raw OCR output
- Verify TCS line if invoice mentions “TCS”
GST Mismatch
GST Mismatch
Common causes:
- CGST/SGST vs. IGST confusion
- Charges using wrong GST rate
- HSN table not detected
- Check
gst.cgst + gst.sgst + gst.igstper item - Verify
doc_level.place_of_supply_state_codematches supplier state for intra-state - Look at
printed.hsn_tax_table— should be non-empty if table exists
Round-Off > ₹1
Round-Off > ₹1
Common causes:
- Incorrect price mode (WITH_TAX vs. WITHOUT_TAX)
- Sequential discounts not captured
- Printed subtotal includes/excludes charges ambiguity
- Check
alternates_considered— iffrom_printed_with_taxhas lower error, price mode might be wrong - Verify
header_discountsarray — should have all printed discounts with correctorder - Review
charges[].taxable— might need manual override
Use the Review Tool
Use the Review Tool
Navigate to
/review in the UI and paste the LangFuse trace or API response.The tool will:- Auto-detect invoice objects in nested JSON
- Display reconciliation breakdown with color-coded errors
- Show line-by-line math (qty × rate, discounts, tax)
- Highlight which candidate was chosen and why
Related Topics
OCR Modes
Compare Raw, Structured, and V4 modes
Review Tool
Debug reconciliation with the LangFuse review interface
