Skip to main content

Overview

The V4 reconciliation engine (lib/invoice_v4.ts) transforms raw OCR output into mathematically self-consistent invoice data. It respects printed anchors (HSN tax tables, subtotals, grand total) and tries multiple interpretations to find the best match.
Goal: Take the model’s structured JSON and make the numbers add up like a real invoice, with minimal error vs. the printed grand total.

Core Principles

Anchor-First

Printed values (HSN tax table, taxable subtotal, grand total) are treated as ground truth

Multi-Pass

Tests multiple price mode hypotheses (WITH_TAX, WITHOUT_TAX) and picks the best

Smart Allocation

Distributes header discounts across GST buckets to match printed GST amounts

Tolerance-Based

Accepts small round-off errors (≤ ₹1.02) as valid matches

Reconciliation Phases

The reconcileV4() function runs in six phases:

Phase 1: Item Line Recomputation

Purpose: Derive line-level ex-tax amounts from rate, quantity, and discounts.
1

Choose Best Source of Truth

For each item, the engine evaluates three candidates:
  1. Computed: qty × rate_ex_tax × (1 - d1%) × (1 - d2%) - flat_discount
  2. Printed: Line amount from OCR, adjusted for price mode (WITH_TAX → divide by (1 + GST%), WITHOUT_TAX → use as-is)
  3. Model Hint: Explicit amount_ex_tax_after_discount if provided
The engine picks the most reliable value using these heuristics:
  • If printed amount ≈ pre-discount gross and discount exists → use computed discounted value
  • Otherwise, prefer printed (but never exceed computed discounted value)
  • Fallback to model hint or computed value
2

Normalize GST Rate

Snaps raw GST percentages to standard slabs:
const GST_SLABS = [0, 0.25, 3, 5, 12, 18, 28];
Examples:
  • 17.818 (within 0.75 tolerance)
  • 0.1818 (auto-scale fractions)
  • 5.15
3

Split CGST/SGST/IGST

Based on supplier GSTIN and place of supply:
const supplierState = getStateCodeFromGstin(supplier_gstin); // e.g., 27 (Maharashtra)
const posCode = place_of_supply_state_code; // e.g., 27
const isIntra = supplierState === posCode;

if (isIntra) {
  cgst = gst_amount / 2;
  sgst = gst_amount / 2;
  igst = 0;
} else {
  cgst = 0;
  sgst = 0;
  igst = gst_amount;
}
4

Build GST Buckets

Groups items by GST rate for bucket-level operations:
gstBuckets = {
  "0": 5000.00,   // ₹5,000 @ 0%
  "5": 2000.00,   // ₹2,000 @ 5%
  "18": 50000.00  // ₹50,000 @ 18%
}

Phase 2: Header Discounts

Purpose: Apply sequential voucher-level discounts (Trade Discount, Special Discount, etc.) before GST.
Applied sequentially with multiplicative effect:
// Example: 10% Trade Discount, then 5% Special Discount
const d1 = 10, d2 = 5;
const effective = d1 + d2 - (d1 * d2) / 100;
// = 10 + 5 - 0.5 = 14.5%

// Applied to each bucket:
bucketEx["18"] = 50000 * (1 - 0.145) = 42750.00;
Order matters! [10%, 5%][5%, 10%].

Phase 3: Printed Anchors

Purpose: Scale items to match printed taxable values when available.
Best case: Invoice includes a detailed HSN-wise tax breakdown.
"hsn_tax_table": [
  {
    "hsn": "8471",
    "taxable_value": 42750.00,
    "cgst_rate": 9,
    "sgst_rate": 9,
    "cgst_amount": 3847.50,
    "sgst_amount": 3847.50
  }
]
The engine:
  1. Groups table rows by total GST rate (e.g., 9% + 9% = 18%)
  2. Scales item lines within each bucket to match printed taxable value exactly:
    const printed = 42750.00;
    const computed = 40000.00;
    const scale = printed / computed; // 1.06875
    
    // Scale all items @ 18%:
    item1.line_ex_tax *= scale;
    item2.line_ex_tax *= scale;
    
  3. Recomputes discounts, GST, totals to maintain consistency
HSN table anchoring is the most accurate method. It eliminates cumulative rounding errors and handles multi-discount structures gracefully.
Fallback: Invoice prints “Total Taxable Value” without per-HSN breakdown.
"printed": {
  "taxable_subtotal": 43250.00,
  "gst_total": 7785.00
}
The engine:
  1. Estimates whether taxable_subtotal includes taxable charges:
    • If charges exist → targetItemsOnly = taxable_subtotal - charges_ex_tax
    • Else → targetItemsOnly = taxable_subtotal
  2. Computes reduction needed: cut = current_items_ex_tax - targetItemsOnly
  3. Uses smart allocation (greedy bucket selection) guided by printed GST total
const approxChargesGst = charges.reduce((s, c) => {
  const rate = c.gst_rate_hint || weightedAvgRate;
  return s + c.ex_tax * (rate / 100);
}, 0);
const targetItemsGst = printedGstTotal - approxChargesGst;

allocateAbsoluteSmart(cut, targetItemsGst);
Rare: No printed taxable subtotal or HSN table.The engine skips anchor adjustments and relies on:
  • Line-level printed amounts (if price mode is WITHOUT_TAX)
  • Computed values from rate × qty × discounts
  • Final grand total matching via round-off

Phase 4: Charges

Purpose: Add freight, packing, insurance, etc. with inferred GST when needed.
charges.forEach(charge => {
  const ex = charge.ex_tax;
  const rate = charge.gst_rate_hint || weightedAvgItemGstRate;
  const gst = ex * (rate / 100);
  const inc = ex + gst;
  
  if (charge.taxable) {
    chargesEx += ex;
    // Add to GST bucket for proper subtotal computation
    bucketEx[String(rate)] = (bucketEx[String(rate)] || 0) + ex;
  } else {
    // Non-taxable charge (e.g., late fee)
    nonTaxableChargesEx += ex;
  }
});
If gst_rate_hint is missing, the engine uses the weighted average GST rate from items:
const totalEx = items.reduce((s, i) => s + i.totals.line_ex_tax, 0);
const totalGst = items.reduce((s, i) => s + i.gst.amount, 0);
const weightedRate = (totalGst / totalEx) * 100;

Phase 5: Totals & TCS

Purpose: Compute final taxable, GST, and grand totals.
1

Decide Non-Taxable Charges Inclusion

Some invoices include non-taxable charges (e.g., late fees) in the grand total, others don’t.The engine tests both options and picks the one with lower error:
const includeMode = {
  taxable_ex: items + charges + nonTaxableCharges,
  grand: taxable_ex + gst + tcs + round_off
};
const excludeMode = {
  taxable_ex: items + charges,
  grand: taxable_ex + gst + tcs + round_off
};

const bestMode = Math.abs(includeMode.grand - printedGrand) <
                 Math.abs(excludeMode.grand - printedGrand)
                 ? 'include' : 'exclude';
2

Compute GST by Bucket

Ensures GST matches per-rate subtotals:
let gstTotal = 0;
for (const [rateStr, ex] of Object.entries(bucketEx)) {
  const rate = parseFloat(rateStr);
  gstTotal += ex * (rate / 100);
}
gstTotal = Math.round(gstTotal * 100) / 100;
3

Apply TCS (if present)

Tax Collected at Source, applied after GST:
const grandBeforeTcs = taxable_ex + gstTotal;
const tcsAmount = tcs.rate > 0
  ? grandBeforeTcs * (tcs.rate / 100)
  : tcs.amount;
const grandAfterTcs = grandBeforeTcs + tcsAmount;
4

Add Round-Off

Final adjustment to match printed total:
const finalGrand = grandAfterTcs + round_off;
const error = Math.abs(finalGrand - printedGrand);
The engine does NOT force round-off to zero error. It respects the provided round_off value and reports the error. This prevents hiding genuine extraction issues.

Phase 6: Multi-Hypothesis Testing

Purpose: Try multiple interpretations and pick the best match. The reconcileV4() function tests four scenarios:
const candidates = [
  { name: "as_is", doc: recomputeDoc(input) },
  { name: "as_is_items_only_when_no_hsn", doc: recomputeDoc(input, { preferItemsOnlyWhenNoHSN: true }) },
  { name: "from_printed_with_tax", doc: rerateFromPrinted(input, "WITH_TAX") },
  { name: "from_printed_without_tax", doc: rerateFromPrinted(input, "WITHOUT_TAX") }
];
Uses rate_ex_tax and price mode hints from the model as-is.Best for: Invoices where the model correctly identified ex-tax rates.
When no HSN table exists, assumes printed taxable subtotal = items only (excludes taxable charges).Best for: Invoices that print “Taxable Value” before “Add: Freight” line.
Reinterprets printed per-unit rate as including GST:
const rate_ex_tax = rate_printed / (1 + gst_rate / 100);
Best for: Retail invoices where MRP/selling price includes GST.
Treats printed rate as ex-tax directly:
const rate_ex_tax = rate_printed;
Best for: B2B invoices with clear “Rate” and “GST” columns.
The engine scores each candidate by:
const score = error_absolute + max(0, Math.abs(implied_round_off) - 1);
Penalizes:
  • High absolute error vs. printed grand total
  • Large round-off values (> ₹1.00)
Picks the candidate with the lowest score.

Tolerance Levels

Tolerance TypeValuePurpose
Line Matching₹0.05Float precision tolerance when comparing printed vs. computed line amounts
GST Slab Snapping0.75%Snap to nearest standard slab if within tolerance (e.g., 17.8% → 18%)
HSN Table Scaling₹0.75Scale printed bucket totals to match taxable subtotal if difference > threshold
Acceptable Round-Off≤ ₹1.02Maximum round-off automatically adopted during reconciliation
Matched Status≤ ₹0.05Error threshold for reconciliation.status = "matched"
These tolerances are conservative by design. If your domain requires stricter matching, consider validating error_absolute < 0.01 in your application layer.

Validation Logic

The reconciliation engine exposes detailed metadata for validation:

1. Error Absolute

const doc = reconcileV4(input);

if (doc.reconciliation.error_absolute <= 0.05) {
  // Perfect or near-perfect match
  return { status: 'valid', confidence: 'high' };
} else if (doc.reconciliation.error_absolute <= 1.00) {
  // Acceptable, likely rounding differences
  return { status: 'valid', confidence: 'medium', warning: 'Minor discrepancy' };
} else {
  // Requires review
  return { status: 'review_required', confidence: 'low', error: doc.reconciliation.error_absolute };
}

2. Alternates Considered

Use this to debug why a specific interpretation was chosen:
console.log(doc.reconciliation.alternates_considered);
// [
//   "as_is:err=0.00,implied_round=0.00,score=0.00",
//   "as_is_items_only_when_no_hsn:err=125.50,implied_round=0.00,score=125.50",
//   "from_printed_with_tax:err=2500.00,implied_round=2500.00,score=5000.00",
//   "from_printed_without_tax:err=0.50,implied_round=0.50,score=1.00"
// ]

// The first one (score=0.00) was chosen

3. Warnings

Check for special cases:
if (doc.reconciliation.warnings.length > 0) {
  doc.reconciliation.warnings.forEach(w => {
    if (w.includes('Excluded non-taxable charges')) {
      // Non-taxable charges were not included in totals
      // Verify this matches invoice layout
    }
  });
}

Real-World Examples

{
  "totals": {
    "items_ex_tax": 50000.00,
    "header_discounts_ex_tax": 5000.00,
    "charges_ex_tax": 500.00,
    "taxable_ex_tax": 45500.00,
    "gst_total": 8190.00,
    "grand_total": 53690.00
  },
  "printed": {
    "taxable_subtotal": 45500.00,
    "gst_total": 8190.00,
    "grand_total": 53690.00
  },
  "reconciliation": {
    "error_absolute": 0.00,
    "alternates_considered": [
      "as_is:err=0.00,implied_round=0.00,score=0.00"
    ],
    "warnings": []
  }
}
✅ All anchors match, zero error, no warnings.

Integration Guide

Step 1: Call the API

const response = await fetch('/api/ocr-structured-v4', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    pdfBase64: pdfData,
    filename: 'invoice.pdf'
  })
});

const doc: V4Doc = await response.json();

Step 2: Validate Reconciliation

function validateInvoice(doc: V4Doc): ValidationResult {
  const { error_absolute, warnings } = doc.reconciliation;
  
  if (error_absolute <= 0.05) {
    return {
      status: 'approved',
      confidence: 'high',
      message: 'All numbers match'
    };
  }
  
  if (error_absolute <= 1.00) {
    return {
      status: 'approved',
      confidence: 'medium',
      message: 'Minor rounding difference',
      warning: `Error: ₹${error_absolute.toFixed(2)}`
    };
  }
  
  return {
    status: 'review_required',
    confidence: 'low',
    message: 'Significant discrepancy detected',
    error: `₹${error_absolute.toFixed(2)}`,
    details: warnings
  };
}

Step 3: Handle Edge Cases

// Check for non-taxable charge exclusions
if (doc.reconciliation.warnings.some(w => w.includes('non-taxable charges'))) {
  // Verify against invoice layout
  const layoutHasChargesInTotal = checkInvoiceLayout(doc);
  if (!layoutHasChargesInTotal) {
    // Expected behavior, no action needed
  } else {
    // Flag for manual review
    flagForReview(doc, 'Charges exclusion mismatch');
  }
}

// Check alternate hypotheses
const alternates = doc.reconciliation.alternates_considered;
const scores = alternates.map(a => {
  const match = a.match(/score=([\d.]+)/);
  return match ? parseFloat(match[1]) : Infinity;
});

if (Math.min(...scores) > 50) {
  // All hypotheses had high errors
  flagForReview(doc, 'No good hypothesis found');
}

Debugging Tips

Common causes:
  1. Missing items on subsequent pages (multi-page PDF)
  2. Charges not extracted (Freight, Packing)
  3. TCS not detected
Debug steps:
  1. Check meta.pages_processed — should match PDF page count
  2. Compare totals.items_ex_tax with printed subtotal
  3. Look for charge keywords in raw OCR output
  4. Verify TCS line if invoice mentions “TCS”
Common causes:
  1. CGST/SGST vs. IGST confusion
  2. Charges using wrong GST rate
  3. HSN table not detected
Debug steps:
  1. Check gst.cgst + gst.sgst + gst.igst per item
  2. Verify doc_level.place_of_supply_state_code matches supplier state for intra-state
  3. Look at printed.hsn_tax_table — should be non-empty if table exists
Common causes:
  1. Incorrect price mode (WITH_TAX vs. WITHOUT_TAX)
  2. Sequential discounts not captured
  3. Printed subtotal includes/excludes charges ambiguity
Debug steps:
  1. Check alternates_considered — if from_printed_with_tax has lower error, price mode might be wrong
  2. Verify header_discounts array — should have all printed discounts with correct order
  3. Review charges[].taxable — might need manual override
Navigate to /review in the UI and paste the LangFuse trace or API response.The tool will:
  • Auto-detect invoice objects in nested JSON
  • Display reconciliation breakdown with color-coded errors
  • Show line-by-line math (qty × rate, discounts, tax)
  • Highlight which candidate was chosen and why

OCR Modes

Compare Raw, Structured, and V4 modes

Review Tool

Debug reconciliation with the LangFuse review interface

Build docs developers (and LLMs) love