Reconciliation Engine

Overview

The V4 reconciliation engine (lib/invoice_v4.ts) transforms raw OCR output into mathematically self-consistent invoice data. It respects printed anchors (HSN tax tables, subtotals, grand total) and tries multiple interpretations to find the best match.

Goal: Take the model’s structured JSON and make the numbers add up like a real invoice, with minimal error vs. the printed grand total.

Core Principles

Anchor-First

Printed values (HSN tax table, taxable subtotal, grand total) are treated as ground truth

Multi-Pass

Tests multiple price mode hypotheses (WITH_TAX, WITHOUT_TAX) and picks the best

Smart Allocation

Distributes header discounts across GST buckets to match printed GST amounts

Tolerance-Based

Accepts small round-off errors (≤ ₹1.02) as valid matches

Reconciliation Phases

The reconcileV4() function runs in six phases:

Phase 1: Item Line Recomputation

Purpose: Derive line-level ex-tax amounts from rate, quantity, and discounts.

Choose Best Source of Truth

For each item, the engine evaluates three candidates:

Computed: qty × rate_ex_tax × (1 - d1%) × (1 - d2%) - flat_discount
Printed: Line amount from OCR, adjusted for price mode (WITH_TAX → divide by (1 + GST%), WITHOUT_TAX → use as-is)
Model Hint: Explicit amount_ex_tax_after_discount if provided

The engine picks the most reliable value using these heuristics:

If printed amount ≈ pre-discount gross and discount exists → use computed discounted value
Otherwise, prefer printed (but never exceed computed discounted value)
Fallback to model hint or computed value

Normalize GST Rate

Snaps raw GST percentages to standard slabs:

const GST_SLABS = [0, 0.25, 3, 5, 12, 18, 28];

Examples:

17.8 → 18 (within 0.75 tolerance)
0.18 → 18 (auto-scale fractions)
5.1 → 5

Split CGST/SGST/IGST

Based on supplier GSTIN and place of supply:

const supplierState = getStateCodeFromGstin(supplier_gstin); // e.g., 27 (Maharashtra)
const posCode = place_of_supply_state_code; // e.g., 27
const isIntra = supplierState === posCode;

if (isIntra) {
  cgst = gst_amount / 2;
  sgst = gst_amount / 2;
  igst = 0;
} else {
  cgst = 0;
  sgst = 0;
  igst = gst_amount;
}

Build GST Buckets

Groups items by GST rate for bucket-level operations:

gstBuckets = {
  "0": 5000.00,   // ₹5,000 @ 0%
  "5": 2000.00,   // ₹2,000 @ 5%
  "18": 50000.00  // ₹50,000 @ 18%
}

Phase 2: Header Discounts

Purpose: Apply sequential voucher-level discounts (Trade Discount, Special Discount, etc.) before GST.

Percent Discounts
Absolute Discounts
Smart Allocation (with GST target)

Applied sequentially with multiplicative effect:

// Example: 10% Trade Discount, then 5% Special Discount
const d1 = 10, d2 = 5;
const effective = d1 + d2 - (d1 * d2) / 100;
// = 10 + 5 - 0.5 = 14.5%

// Applied to each bucket:
bucketEx["18"] = 50000 * (1 - 0.145) = 42750.00;

Order matters! [10%, 5%] ≠ [5%, 10%].

Distributed proportionally across buckets:

// Example: ₹1,000 flat discount
const discount = 1000;
const totalEx = 42750; // current bucket total

// For 18% bucket (₹42,750 out of ₹42,750):
const share = 42750 / 42750; // = 1.0
bucketEx["18"] -= discount * share; // = 41750.00

When printed GST total is known, the engine allocates absolute discounts greedily to match the target:

// Goal: Reduce base to match printed GST
const targetItemsGst = printedGstTotal - approxChargesGst;
const currentItemsGst = sum(bucketEx[rate] * rate / 100);
const needReduction = currentItemsGst - targetItemsGst;

// Iteratively take from bucket closest to effective rate
while (remainingAmt > 0) {
  const k = remainingW / remainingAmt; // desired rate
  const closestBucket = findClosest(k, availableBuckets);
  takeFromBucket(closestBucket, min(capacity, remainingAmt));
}

This ensures the GST total matches the printed value even with tricky discount structures.

Phase 3: Printed Anchors

Purpose: Scale items to match printed taxable values when available.

Scenario A: HSN Tax Table Exists

Best case: Invoice includes a detailed HSN-wise tax breakdown.

"hsn_tax_table": [
  {
    "hsn": "8471",
    "taxable_value": 42750.00,
    "cgst_rate": 9,
    "sgst_rate": 9,
    "cgst_amount": 3847.50,
    "sgst_amount": 3847.50
  }
]

The engine:

Groups table rows by total GST rate (e.g., 9% + 9% = 18%)

Scales item lines within each bucket to match printed taxable value exactly:

const printed = 42750.00;
const computed = 40000.00;
const scale = printed / computed; // 1.06875

// Scale all items @ 18%:
item1.line_ex_tax *= scale;
item2.line_ex_tax *= scale;

Recomputes discounts, GST, totals to maintain consistency

HSN table anchoring is the most accurate method. It eliminates cumulative rounding errors and handles multi-discount structures gracefully.

Scenario B: Printed Taxable Subtotal Only

Fallback: Invoice prints “Total Taxable Value” without per-HSN breakdown.

"printed": {
  "taxable_subtotal": 43250.00,
  "gst_total": 7785.00
}

The engine:

Estimates whether taxable_subtotal includes taxable charges:
- If charges exist → targetItemsOnly = taxable_subtotal - charges_ex_tax
- Else → targetItemsOnly = taxable_subtotal
Computes reduction needed: cut = current_items_ex_tax - targetItemsOnly
Uses smart allocation (greedy bucket selection) guided by printed GST total

const approxChargesGst = charges.reduce((s, c) => {
  const rate = c.gst_rate_hint || weightedAvgRate;
  return s + c.ex_tax * (rate / 100);
}, 0);
const targetItemsGst = printedGstTotal - approxChargesGst;

allocateAbsoluteSmart(cut, targetItemsGst);

Scenario C: No Anchors

Rare: No printed taxable subtotal or HSN table.The engine skips anchor adjustments and relies on:

Line-level printed amounts (if price mode is WITHOUT_TAX)
Computed values from rate × qty × discounts
Final grand total matching via round-off

Phase 4: Charges

Purpose: Add freight, packing, insurance, etc. with inferred GST when needed.

charges.forEach(charge => {
  const ex = charge.ex_tax;
  const rate = charge.gst_rate_hint || weightedAvgItemGstRate;
  const gst = ex * (rate / 100);
  const inc = ex + gst;
  
  if (charge.taxable) {
    chargesEx += ex;
    // Add to GST bucket for proper subtotal computation
    bucketEx[String(rate)] = (bucketEx[String(rate)] || 0) + ex;
  } else {
    // Non-taxable charge (e.g., late fee)
    nonTaxableChargesEx += ex;
  }
});

If gst_rate_hint is missing, the engine uses the weighted average GST rate from items:

const totalEx = items.reduce((s, i) => s + i.totals.line_ex_tax, 0);
const totalGst = items.reduce((s, i) => s + i.gst.amount, 0);
const weightedRate = (totalGst / totalEx) * 100;

Phase 5: Totals & TCS

Purpose: Compute final taxable, GST, and grand totals.

Decide Non-Taxable Charges Inclusion

Some invoices include non-taxable charges (e.g., late fees) in the grand total, others don’t.The engine tests both options and picks the one with lower error:

const includeMode = {
  taxable_ex: items + charges + nonTaxableCharges,
  grand: taxable_ex + gst + tcs + round_off
};
const excludeMode = {
  taxable_ex: items + charges,
  grand: taxable_ex + gst + tcs + round_off
};

const bestMode = Math.abs(includeMode.grand - printedGrand) <
                 Math.abs(excludeMode.grand - printedGrand)
                 ? 'include' : 'exclude';

Compute GST by Bucket

Ensures GST matches per-rate subtotals:

let gstTotal = 0;
for (const [rateStr, ex] of Object.entries(bucketEx)) {
  const rate = parseFloat(rateStr);
  gstTotal += ex * (rate / 100);
}
gstTotal = Math.round(gstTotal * 100) / 100;

Apply TCS (if present)

Tax Collected at Source, applied after GST:

const grandBeforeTcs = taxable_ex + gstTotal;
const tcsAmount = tcs.rate > 0
  ? grandBeforeTcs * (tcs.rate / 100)
  : tcs.amount;
const grandAfterTcs = grandBeforeTcs + tcsAmount;

Add Round-Off

Final adjustment to match printed total:

const finalGrand = grandAfterTcs + round_off;
const error = Math.abs(finalGrand - printedGrand);

The engine does NOT force round-off to zero error. It respects the provided round_off value and reports the error. This prevents hiding genuine extraction issues.

Phase 6: Multi-Hypothesis Testing

Purpose: Try multiple interpretations and pick the best match. The reconcileV4() function tests four scenarios:

const candidates = [
  { name: "as_is", doc: recomputeDoc(input) },
  { name: "as_is_items_only_when_no_hsn", doc: recomputeDoc(input, { preferItemsOnlyWhenNoHSN: true }) },
  { name: "from_printed_with_tax", doc: rerateFromPrinted(input, "WITH_TAX") },
  { name: "from_printed_without_tax", doc: rerateFromPrinted(input, "WITHOUT_TAX") }
];

Candidate 1: as_is

Uses rate_ex_tax and price mode hints from the model as-is.Best for: Invoices where the model correctly identified ex-tax rates.

Candidate 2: as_is_items_only_when_no_hsn

When no HSN table exists, assumes printed taxable subtotal = items only (excludes taxable charges).Best for: Invoices that print “Taxable Value” before “Add: Freight” line.

Candidate 3: from_printed_with_tax

Reinterprets printed per-unit rate as including GST:

const rate_ex_tax = rate_printed / (1 + gst_rate / 100);

Best for: Retail invoices where MRP/selling price includes GST.

Candidate 4: from_printed_without_tax

Treats printed rate as ex-tax directly:

const rate_ex_tax = rate_printed;

Best for: B2B invoices with clear “Rate” and “GST” columns.

The engine scores each candidate by:

const score = error_absolute + max(0, Math.abs(implied_round_off) - 1);

Penalizes:

High absolute error vs. printed grand total
Large round-off values (> ₹1.00)

Picks the candidate with the lowest score.

Tolerance Levels

Tolerance Type	Value	Purpose
Line Matching	₹0.05	Float precision tolerance when comparing printed vs. computed line amounts
GST Slab Snapping	0.75%	Snap to nearest standard slab if within tolerance (e.g., 17.8% → 18%)
HSN Table Scaling	₹0.75	Scale printed bucket totals to match taxable subtotal if difference > threshold
Acceptable Round-Off	≤ ₹1.02	Maximum round-off automatically adopted during reconciliation
Matched Status	≤ ₹0.05	Error threshold for `reconciliation.status = "matched"`

These tolerances are conservative by design. If your domain requires stricter matching, consider validating error_absolute < 0.01 in your application layer.

Validation Logic

The reconciliation engine exposes detailed metadata for validation:

1. Error Absolute

const doc = reconcileV4(input);

if (doc.reconciliation.error_absolute <= 0.05) {
  // Perfect or near-perfect match
  return { status: 'valid', confidence: 'high' };
} else if (doc.reconciliation.error_absolute <= 1.00) {
  // Acceptable, likely rounding differences
  return { status: 'valid', confidence: 'medium', warning: 'Minor discrepancy' };
} else {
  // Requires review
  return { status: 'review_required', confidence: 'low', error: doc.reconciliation.error_absolute };
}

2. Alternates Considered

Use this to debug why a specific interpretation was chosen:

console.log(doc.reconciliation.alternates_considered);
// [
//   "as_is:err=0.00,implied_round=0.00,score=0.00",
//   "as_is_items_only_when_no_hsn:err=125.50,implied_round=0.00,score=125.50",
//   "from_printed_with_tax:err=2500.00,implied_round=2500.00,score=5000.00",
//   "from_printed_without_tax:err=0.50,implied_round=0.50,score=1.00"
// ]

// The first one (score=0.00) was chosen

3. Warnings

Check for special cases:

if (doc.reconciliation.warnings.length > 0) {
  doc.reconciliation.warnings.forEach(w => {
    if (w.includes('Excluded non-taxable charges')) {
      // Non-taxable charges were not included in totals
      // Verify this matches invoice layout
    }
  });
}

Real-World Examples

Perfect Match
HSN Scaling
With Minor Error
Requires Review

{
  "totals": {
    "items_ex_tax": 50000.00,
    "header_discounts_ex_tax": 5000.00,
    "charges_ex_tax": 500.00,
    "taxable_ex_tax": 45500.00,
    "gst_total": 8190.00,
    "grand_total": 53690.00
  },
  "printed": {
    "taxable_subtotal": 45500.00,
    "gst_total": 8190.00,
    "grand_total": 53690.00
  },
  "reconciliation": {
    "error_absolute": 0.00,
    "alternates_considered": [
      "as_is:err=0.00,implied_round=0.00,score=0.00"
    ],
    "warnings": []
  }
}

✅ All anchors match, zero error, no warnings.

{
  "items": [
    {
      "name": "Item A",
      "totals": { "line_ex_tax": 22750.00 }, // Scaled from 25000
      "gst": { "rate": 18, "amount": 4095.00 }
    },
    {
      "name": "Item B",
      "totals": { "line_ex_tax": 22750.00 }, // Scaled from 25000
      "gst": { "rate": 18, "amount": 4095.00 }
    }
  ],
  "printed": {
    "hsn_tax_table": [
      {
        "hsn": "8471",
        "taxable_value": 45500.00,
        "cgst_rate": 9,
        "sgst_rate": 9,
        "cgst_amount": 4095.00,
        "sgst_amount": 4095.00
      }
    ]
  },
  "reconciliation": {
    "error_absolute": 0.00,
    "warnings": []
  }
}

✅ Items were scaled down by 45500 / 50000 = 0.91 to match HSN table.

{
  "totals": {
    "grand_total": 53690.50
  },
  "printed": {
    "grand_total": 53690.00
  },
  "round_off": 0.00,
  "reconciliation": {
    "error_absolute": 0.50,
    "alternates_considered": [
      "as_is:err=0.50,implied_round=0.50,score=1.00",
      "from_printed_without_tax:err=0.00,implied_round=0.00,score=0.00"
    ]
  }
}

⚠️ Small error, but from_printed_without_tax candidate matched perfectly and was chosen.

{
  "totals": {
    "grand_total": 54200.00
  },
  "printed": {
    "grand_total": 53690.00
  },
  "reconciliation": {
    "error_absolute": 510.00,
    "alternates_considered": [
      "as_is:err=510.00,implied_round=510.00,score=1020.00",
      "from_printed_with_tax:err=125.50,implied_round=125.50,score=251.00",
      "from_printed_without_tax:err=510.00,implied_round=510.00,score=1020.00"
    ],
    "warnings": []
  }
}

❌ Large error even after trying all candidates. Possible causes:

Missing charges in OCR output
Incorrect discount interpretation
Multi-page invoice with items on different pages

Integration Guide

Step 1: Call the API

const response = await fetch('/api/ocr-structured-v4', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    pdfBase64: pdfData,
    filename: 'invoice.pdf'
  })
});

const doc: V4Doc = await response.json();

Step 2: Validate Reconciliation

function validateInvoice(doc: V4Doc): ValidationResult {
  const { error_absolute, warnings } = doc.reconciliation;
  
  if (error_absolute <= 0.05) {
    return {
      status: 'approved',
      confidence: 'high',
      message: 'All numbers match'
    };
  }
  
  if (error_absolute <= 1.00) {
    return {
      status: 'approved',
      confidence: 'medium',
      message: 'Minor rounding difference',
      warning: `Error: ₹${error_absolute.toFixed(2)}`
    };
  }
  
  return {
    status: 'review_required',
    confidence: 'low',
    message: 'Significant discrepancy detected',
    error: `₹${error_absolute.toFixed(2)}`,
    details: warnings
  };
}

Step 3: Handle Edge Cases

// Check for non-taxable charge exclusions
if (doc.reconciliation.warnings.some(w => w.includes('non-taxable charges'))) {
  // Verify against invoice layout
  const layoutHasChargesInTotal = checkInvoiceLayout(doc);
  if (!layoutHasChargesInTotal) {
    // Expected behavior, no action needed
  } else {
    // Flag for manual review
    flagForReview(doc, 'Charges exclusion mismatch');
  }
}

// Check alternate hypotheses
const alternates = doc.reconciliation.alternates_considered;
const scores = alternates.map(a => {
  const match = a.match(/score=([\d.]+)/);
  return match ? parseFloat(match[1]) : Infinity;
});

if (Math.min(...scores) > 50) {
  // All hypotheses had high errors
  flagForReview(doc, 'No good hypothesis found');
}

Debugging Tips

Large Error (> ₹10)

Common causes:

Missing items on subsequent pages (multi-page PDF)
Charges not extracted (Freight, Packing)
TCS not detected

Debug steps:

Check meta.pages_processed — should match PDF page count
Compare totals.items_ex_tax with printed subtotal
Look for charge keywords in raw OCR output
Verify TCS line if invoice mentions “TCS”

GST Mismatch

Common causes:

CGST/SGST vs. IGST confusion
Charges using wrong GST rate
HSN table not detected

Debug steps:

Check gst.cgst + gst.sgst + gst.igst per item
Verify doc_level.place_of_supply_state_code matches supplier state for intra-state
Look at printed.hsn_tax_table — should be non-empty if table exists

Round-Off > ₹1

Common causes:

Incorrect price mode (WITH_TAX vs. WITHOUT_TAX)
Sequential discounts not captured
Printed subtotal includes/excludes charges ambiguity

Debug steps:

Check alternates_considered — if from_printed_with_tax has lower error, price mode might be wrong
Verify header_discounts array — should have all printed discounts with correct order
Review charges[].taxable — might need manual override

Use the Review Tool

Navigate to /review in the UI and paste the LangFuse trace or API response.The tool will:

Auto-detect invoice objects in nested JSON
Display reconciliation breakdown with color-coded errors
Show line-by-line math (qty × rate, discounts, tax)
Highlight which candidate was chosen and why

OCR Modes

Compare Raw, Structured, and V4 modes

Review Tool

Debug reconciliation with the LangFuse review interface

Get Started

Core Features

Guides

Configuration

Reconciliation Engine

Overview

Core Principles

Anchor-First

Multi-Pass

Smart Allocation

Tolerance-Based

Reconciliation Phases

Phase 1: Item Line Recomputation

Phase 2: Header Discounts

Phase 3: Printed Anchors

Phase 4: Charges

Phase 5: Totals & TCS

Phase 6: Multi-Hypothesis Testing

Tolerance Levels

Validation Logic

1. Error Absolute

2. Alternates Considered

3. Warnings

Real-World Examples

Integration Guide

Step 1: Call the API

Step 2: Validate Reconciliation

Step 3: Handle Edge Cases

Debugging Tips

OCR Modes

Review Tool

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Configuration

​Overview

​Core Principles

Anchor-First

Multi-Pass

Smart Allocation

Tolerance-Based

​Reconciliation Phases

​Phase 1: Item Line Recomputation

​Phase 2: Header Discounts

​Phase 3: Printed Anchors

​Phase 4: Charges

​Phase 5: Totals & TCS

​Phase 6: Multi-Hypothesis Testing

​Tolerance Levels

​Validation Logic

​1. Error Absolute

​2. Alternates Considered

​3. Warnings

​Real-World Examples

​Integration Guide

​Step 1: Call the API

​Step 2: Validate Reconciliation

​Step 3: Handle Edge Cases

​Debugging Tips

​Related Topics

OCR Modes

Review Tool

Build docs developers (and LLMs) love

Overview

Core Principles

Reconciliation Phases

Phase 1: Item Line Recomputation

Phase 2: Header Discounts

Phase 3: Printed Anchors

Phase 4: Charges

Phase 5: Totals & TCS

Phase 6: Multi-Hypothesis Testing

Tolerance Levels

Validation Logic

1. Error Absolute

2. Alternates Considered

3. Warnings

Real-World Examples

Integration Guide

Step 1: Call the API

Step 2: Validate Reconciliation

Step 3: Handle Edge Cases

Debugging Tips

Related Topics