Skip to main content

Overview

Reconciliation is the process of ensuring that extracted invoice data adds up correctly. The Invoice OCR system performs sophisticated mathematical validation to detect and correct extraction errors, producing totals that match the printed invoice.

Why Reconciliation Matters

AI models can extract text and numbers from invoices, but they don’t inherently understand invoice arithmetic:

Discount Logic

Sequential vs flat, before vs after tax

Tax Calculation

CGST/SGST splits, rate slabs, rounding

Header vs Line

Invoice-level vs item-level discounts
Without reconciliation, you might get:
  • Grand totals that don’t match the printed amount
  • Tax calculations that are off by several rupees
  • Discounts applied in the wrong order

Reconciliation Architecture

The system uses a two-phase approach:

Phase 1: Extraction

// Source: app/api/ocr-structured-v4/route.ts:136-194
const SYSTEM_PROMPT = `# Role
You are an invoice OCR normalizer for India GST (v4)...

# Decision Rules
1. Price Mode
   - Prefer ex‑tax lines when a separate GST summary exists
2. Discounts
   - Apply sequentially: d_eq = d1 + d2 − d1*d2
3. Line Math (normalized to ex‑tax)
   - WITH_TAX → base_ex = rate_printed / (1 + t)
   - WITHOUT_TAX → base_ex = rate_printed
...
`;
The AI model extracts data following strict rules, but may still have ambiguities or errors.

Phase 2: Reconciliation

// Source: lib/invoice_v4.ts:587-641
export function reconcileV4(input: V4Doc): V4Doc {
  // Try multiple hypotheses and score by:
  // 1. Error vs printed grand total
  // 2. Implied round‑off reasonableness
  const candidates: Candidate[] = [
    { name: "as_is", doc: recomputeDoc(input, ...), ... },
    { name: "from_printed_with_tax", doc: rerateFromPrinted(input, "WITH_TAX", ...), ... },
    { name: "from_printed_without_tax", doc: rerateFromPrinted(input, "WITHOUT_TAX", ...), ... },
  ];
  
  // Pick best by lowest score (error + round_off penalty)
  let best = candidates[0];
  ...
  return out;
}

Line-Level Reconciliation

Each invoice line item goes through multi-step reconciliation:
1

Choose Best Ex-Tax Value

The system evaluates three sources:
// Source: lib/invoice_v4.ts:160-206
// Candidate 1: compute from rate/discounts
const computedLineEx = afterFlat * qty;

// Candidate 2: trust printed amount if price mode suggests ex‑tax
const printedLineEx = priceMode === "WITHOUT_TAX" ? printedAmt : ...

// Candidate 3: model-provided explicit ex-tax after discount
const modelLineEx = n(it.raw?.amount_ex_tax_after_discount);
The system prefers printed amounts when available, but falls back to computed values if discounts indicate the printed amount is pre-discount.
2

Normalize GST Rate

Snap noisy model outputs to real GST slabs:
// Source: lib/standards.ts:15-38
export const GST_SLABS = [0, 0.25, 3, 5, 12, 18, 28] as const;

export function normalizeGstRate(input: unknown): number {
  // If model returns 0.18 for 18%, scale to percent
  if (rate > 0 && rate <= 1.5) rate = rate * 100;
  
  // Snap to nearest known slab within tolerance
  const nearest = GST_SLABS.reduce((prev, s) =>
    Math.abs(rate - s) < Math.abs(rate - prev) ? s : prev
  );
  if (Math.abs(rate - nearest) <= 0.75) return nearest;
  ...
}
3

Calculate Tax Split

Determine CGST/SGST vs IGST based on state codes:
// Source: lib/invoice_v4.ts:140-147
const supplierState = getStateCodeFromGstin(out.doc_level?.supplier_gstin);
const posCode = parseInt(out.doc_level?.place_of_supply_state_code, 10);
const isIntra = supplierState === posCode;

// Source: lib/invoice_v4.ts:227-232
gst: {
  rate: tRate,
  cgst: r2(tRate > 0 && isIntra === true ? (gstAmt / 2) : 0),
  sgst: r2(tRate > 0 && isIntra === true ? (gstAmt / 2) : 0),
  igst: r2(tRate > 0 && isIntra === false ? gstAmt : 0),
  amount: r2(gstAmt),
}
Intra-state transactions split GST equally into CGST and SGST. Inter-state uses IGST only.

Header-Level Reconciliation

After line items are reconciled, the system applies header-level adjustments:

Sequential Discounts

// Source: lib/invoice_v4.ts:246-265
const ordered = [...(out.header_discounts || [])].sort((a, b) => n(a.order) - n(b.order));

const applyPercent = (pct: number) => {
  const f = 1 - pct / 100;
  for (const k of Object.keys(bucketEx)) bucketEx[k] = r2(bucketEx[k] * f);
  const cut = r2(baseEx * (pct / 100));
  baseEx = r2(baseEx - cut);
  headerDiscEx = r2(headerDiscEx + cut);
};

const applyAbsolute = (amt: number) => {
  const total = Object.values(bucketEx).reduce((s, v) => s + v, 0) || 1;
  for (const k of Object.keys(bucketEx)) {
    const share = bucketEx[k] / total;
    bucketEx[k] = r2(Math.max(0, bucketEx[k] - amt * share));
  }
  baseEx = r2(Math.max(0, baseEx - amt));
  headerDiscEx = r2(headerDiscEx + amt);
};

for (const hd of ordered) {
  const type = String(hd.type || "").toUpperCase();
  if (type === "PERCENT") applyPercent(n(hd.value));
  else if (type === "ABSOLUTE") applyAbsolute(Math.min(n(hd.value), baseEx));
}
Key behaviors:
  • Percent discounts: Applied sequentially using d_eq = d1 + d2 − d1*d2 formula
  • Absolute discounts: Allocated proportionally across tax buckets
  • Order matters: order field determines application sequence

HSN Table Anchoring

When a printed HSN tax table exists, it becomes the source of truth:
// Source: lib/invoice_v4.ts:334-383
const printedBucketByRate: Record<string, number> = {};
for (const row of printedTable) {
  let r = normalizeGstRate(n(row?.cgst_rate) + n(row?.sgst_rate) + n(row?.igst_rate));
  const ex = n(row?.taxable_value);
  if (ex > 0 && Number.isFinite(r) && r > 0) {
    const key = String(r);
    printedBucketByRate[key] = r2((printedBucketByRate[key] || 0) + ex);
  }
}

if (printedBucketTotal > 0) {
  // Scale items within each rate bucket to match printed taxable value exactly
  for (const [rateStr, target] of Object.entries(printedBucketByRate)) {
    const idxs = rateToIdx[rateStr] || [];
    const current = idxs.reduce((s, i) => s + n(out.items[i]?.totals?.line_ex_tax), 0);
    if (idxs.length === 0 || current <= 0) continue;
    const scale = target / current;
    for (const i of idxs) {
      // Scale each item proportionally
      ...
    }
  }
}
HSN tables are the most reliable anchor because they’re typically printed by accounting software and represent validated totals per tax rate.

Charges and TCS

Additional charges and TCS are reconciled after items:
// Source: lib/invoice_v4.ts:453-477
out.charges = (out.charges || []).map((c) => {
  const ex = r2(n(c.ex_tax));
  const isTaxable = !!c.taxable;
  const rate = isTaxable ? 
    (c.gst_rate_hint != null ? n(c.gst_rate_hint) : weightedRate) : 
    0;
  const gst = r2(ex * (rate / 100));
  if (isTaxable) {
    chargesEx += ex;
    const key = String(r2(rate));
    if (rate > 0) {
      bucketEx[key] = (bucketEx[key] || 0) + ex;
    }
  }
  return { ...c, gst_rate_hint: c.gst_rate_hint, gst_amount: gst, inc_tax: r2(ex + gst) };
});
GST rate inference: If a charge is marked taxable but no rate is provided, the system uses the weighted average GST rate from items.

Candidate Scoring

The reconciliation engine evaluates multiple hypotheses:
// Source: lib/invoice_v4.ts:587-622
const candidates: Candidate[] = [
  { name: "as_is", doc: recomputeDoc(input, { preferItemsOnlyWhenNoHSN: false }), ... },
  { name: "as_is_items_only_when_no_hsn", doc: recomputeDoc(input, { preferItemsOnlyWhenNoHSN: true }), ... },
  { name: "from_printed_with_tax", doc: rerateFromPrinted(input, "WITH_TAX", ...), ... },
  { name: "from_printed_without_tax", doc: rerateFromPrinted(input, "WITHOUT_TAX", ...), ... },
];

const scoreOf = (c: Candidate) => {
  const computedNoRound = r2((n(d.totals?.grand_total) - n(d.round_off)));
  const impliedRound = printedGrand > 0 ? r2(printedGrand - computedNoRound) : n(d.round_off);
  const err = c.errorAbs;
  const roundPenalty = Math.max(0, Math.abs(impliedRound) - 1); // prefer |round_off| <= 1
  const score = r2(err + roundPenalty);
  return { score, impliedRound };
};

// Pick best by lowest score
let best = candidates[0];
let bestMeta = scoreOf(best);
for (const c of candidates.slice(1)) {
  const meta = scoreOf(c);
  if (meta.score < bestMeta.score || ...) {
    best = c;
    bestMeta = meta;
  }
}
Scoring criteria:
  1. Error vs printed grand total: Lower is better
  2. Implied round-off: Should be ≤ ₹1.00 for reasonable invoices
  3. Tie-breaking: Prefer lower absolute round-off, then lower error
A large round-off (> ₹1.00) often indicates incorrect discount logic or tax calculation mode. The system penalizes such candidates.

Reading Reconciliation Output

Reconciliation Object

{
  "reconciliation": {
    "error_absolute": 0.10,
    "alternates_considered": [
      "as_is:err=0.10,implied_round=0.10,score=0.10",
      "from_printed_with_tax:err=2.50,implied_round=2.50,score=3.50",
      "from_printed_without_tax:err=15.30,implied_round=15.30,score=29.30"
    ],
    "warnings": [
      "Excluded non-taxable charges (₹50.00) from totals to match printed amount."
    ]
  }
}
Fields explained:
FieldDescriptionIdeal Value
error_absoluteDifference between computed and printed grand total≤ 0.05 (5 paise)
alternates_consideredList of hypotheses tried with their scoresMultiple options logged
warningsNon-critical adjustments made during reconciliationEmpty or minimal

Totals Object

{
  "totals": {
    "items_ex_tax": 1000.00,
    "header_discounts_ex_tax": 50.00,
    "charges_ex_tax": 100.00,
    "taxable_ex_tax": 1050.00,
    "gst_total": 189.00,
    "grand_total": 1239.10
  }
}
Calculation flow:
1

Items ex-tax

Sum of all line items after item-level discounts: ₹1000.00
2

Header discounts

Invoice-level discounts applied: -₹50.00
3

Charges

Additional charges (freight, etc.): +₹100.00
4

Taxable subtotal

Base for GST calculation: ₹1050.00
5

GST total

Tax calculated on taxable subtotal: +₹189.00
6

Grand total

Final amount including round-off: ₹1239.10

Debugging Reconciliation

Use the Review Tool to inspect reconciliation in detail:

Compact Schema Breakdown

// Source: app/review/page.tsx:179-339
function CompactBreakdown({ doc }: { doc: InvoiceDoc }) {
  const reconciliation = React.useMemo(() => reconcile(doc), [doc]);
  const { lines, charges, items_taxable_total, items_tax_total, 
          difference } = reconciliation;
  
  return (
    <div className="space-y-4">
      <Card>
        <CardHeader>
          <CardTitle>Line-Level Math</CardTitle>
          <CardDescription>
            Qty × rate, item discounts, invoice-level discounts, and tax per line.
          </CardDescription>
        </CardHeader>
        ...
      </Card>
    </div>
  );
}
What you’ll see:
  • Per-line calculations with intermediate values
  • Discount allocations (item-level vs invoice-level)
  • Tax calculation breakdown by rate
  • Final difference vs printed total
The Review Tool at /review accepts JSON from LangFuse traces or API responses. Paste the full payload to see detailed reconciliation breakdowns.

Common Reconciliation Scenarios

Scenario 1: With-Tax vs Without-Tax

Problem: Model extracts rates but doesn’t know if they include GST. Solution: Reconciliation tries both interpretations:
function rerateFromPrinted(doc: V4Doc, mode: "WITH_TAX" | "WITHOUT_TAX", ...): V4Doc {
  const out = clone(doc);
  out.items = (out.items || []).map((it) => {
    const t = n(it.gst?.rate) / 100;
    const printedRate = n(it.raw?.rate_printed) || n(it.rate_ex_tax);
    const baseEx = mode === "WITH_TAX" ? (printedRate / (1 + t)) : printedRate;
    return { ...it, rate_ex_tax: r2(baseEx) };
  });
  return recomputeDoc(out, opts);
}
The candidate with the lowest error is selected.

Scenario 2: Missing HSN Table

Problem: No HSN table printed; only a single taxable subtotal. Solution: System infers whether charges are included:
// Source: lib/invoice_v4.ts:419-448
const printedTaxable = n(out.printed?.taxable_subtotal);
if (printedTaxable > 0) {
  const draftChargesTaxable = (out.charges || []).reduce(...);
  const targetItemsOnly = r2(Math.max(0, printedTaxable - draftChargesTaxable));
  const preferItemsOnly = (draftChargesTaxable > 0);
  const chosenTarget = preferItemsOnly ? targetItemsOnly : printedTaxable;
  const cut = r2(baseEx - chosenTarget);
  if (cut > 0.75) {
    // Allocate reduction guided by target items GST
    allocateAbsoluteSmart(cut, targetItemsGst);
  }
}

Scenario 3: Large Round-Off

Problem: Computed total differs from printed by several rupees. Solution: Review warnings for clues:
{
  "reconciliation": {
    "error_absolute": 5.50,
    "warnings": [
      "Excluded non-taxable charges (₹5.50) from totals to match printed amount."
    ]
  }
}
Non-taxable charges might not be included in the printed grand total.

Next Steps

Debugging with Review Tool

Use the /review page to debug reconciliation issues

API Reference

Explore the full reconciliation API schema

Build docs developers (and LLMs) love