Overview
Hybrid extraction combines vision-based processing with OCR text to provide the best of both worlds: the accuracy of text extraction with the context of visual information. This is particularly useful for complex documents with tables, charts, or mixed content.
When enableHybridExtraction is enabled, Zerox:
First performs OCR to extract text from each page
Then sends both the images and OCR text to the extraction model
The model uses the text for accuracy and the images for visual context
Hybrid extraction requires a schema and cannot be used with extractOnly or directImageExtraction modes.
import { zerox } from 'zerox' ;
const result = await zerox ({
filePath: './financial-report.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
revenue: {
type: 'number' ,
description: 'Total revenue from the financial table'
},
expenses: {
type: 'number' ,
description: 'Total expenses'
},
chartTitle: {
type: 'string' ,
description: 'Title of the chart or graph'
},
trends: {
type: 'array' ,
description: 'Key trends visible in charts' ,
items: { type: 'string' }
}
}
}
});
console . log ( result . extracted );
// {
// revenue: 1250000,
// expenses: 890000,
// chartTitle: 'Quarterly Revenue Growth',
// trends: ['Upward trend in Q3', 'Peak in December']
// }
// You also get the OCR text
console . log ( result . pages [ 0 ]. content );
Hybrid extraction is ideal for documents with:
Complex Tables
Tables with merged cells, multiple headers, or nested structures benefit from both visual layout and text content:
const result = await zerox ({
filePath: './complex-table.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
quarterlyData: {
type: 'array' ,
description: 'Data from the quarterly comparison table' ,
items: {
type: 'object' ,
properties: {
quarter: { type: 'string' },
region: { type: 'string' },
sales: { type: 'number' },
growth: { type: 'string' }
}
}
}
}
}
});
Charts and Graphs
Visual data representations that include both text labels and graphical elements:
const result = await zerox ({
filePath: './presentation.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
chartType: {
type: 'string' ,
description: 'Type of chart (bar, line, pie, etc.)'
},
dataPoints: {
type: 'array' ,
description: 'Values from the chart' ,
items: {
type: 'object' ,
properties: {
label: { type: 'string' },
value: { type: 'number' }
}
}
},
visualInsights: {
type: 'array' ,
description: 'Observations from the visual representation' ,
items: { type: 'string' }
}
}
}
});
Forms containing checkboxes, signatures, stamps, or highlighted sections:
const result = await zerox ({
filePath: './application-form.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
applicantName: { type: 'string' },
selectedOptions: {
type: 'array' ,
description: 'Checked boxes in the form' ,
items: { type: 'string' }
},
hasSignature: {
type: 'boolean' ,
description: 'Whether the form is signed'
},
highlightedSections: {
type: 'array' ,
description: 'Sections that are highlighted or marked' ,
items: { type: 'string' }
}
}
}
});
Hybrid extraction works seamlessly with per-page extraction:
import { zerox } from 'zerox' ;
const result = await zerox ({
filePath: './multi-page-report.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
// Document-level: extracted from all pages
reportTitle: { type: 'string' },
author: { type: 'string' },
// Page-level: extracted from each page
charts: {
type: 'array' ,
description: 'Charts found on this page' ,
items: {
type: 'object' ,
properties: {
title: { type: 'string' },
type: { type: 'string' },
keyFinding: { type: 'string' }
}
}
}
}
},
extractPerPage: [ 'charts' ]
});
// Result includes page numbers for per-page extractions
console . log ( result . extracted );
// {
// reportTitle: 'Annual Analysis 2024',
// author: 'Data Team',
// charts: [
// { page: 1, value: [{ title: 'Revenue', type: 'bar', ... }] },
// { page: 2, value: [{ title: 'Growth', type: 'line', ... }] }
// ]
// }
Token Usage
Hybrid extraction uses more tokens because it processes both images and text:
const result = await zerox ({
filePath: './document.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: mySchema
});
console . log ( `Input tokens: ${ result . inputTokens } ` );
console . log ( `Output tokens: ${ result . outputTokens } ` );
// Hybrid extraction typically uses 2-3x more input tokens
// than text-only extraction due to image processing
Hybrid extraction increases token usage and cost compared to text-only extraction. Use it when visual context significantly improves accuracy.
Processing Time
Hybrid extraction requires both OCR and extraction steps:
const start = Date . now ();
const result = await zerox ({
filePath: './document.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: mySchema ,
concurrency: 5 // Process multiple pages in parallel
});
console . log ( `Completed in ${ result . completionTime } ms` );
Guide the model to leverage both visual and textual information:
const result = await zerox ({
filePath: './infographic.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
mainStatistics: {
type: 'array' ,
items: {
type: 'object' ,
properties: {
label: { type: 'string' },
value: { type: 'string' },
visualEmphasis: {
type: 'string' ,
description: 'How this stat is visually emphasized (color, size, icons, etc.)'
}
}
}
}
}
},
extractionPrompt: `Extract statistics from this infographic.
Use the text for accurate values and the visual elements to understand emphasis and hierarchy.
Note any visual indicators like colors, icons, or size differences that highlight important data.`
});
Limitations
Cannot Be Used With
Hybrid extraction has some restrictions:
// ❌ Cannot use with extractOnly
await zerox ({
filePath: './doc.pdf' ,
enableHybridExtraction: true ,
extractOnly: true , // Error: incompatible
schema: mySchema
});
// ❌ Cannot use with directImageExtraction
await zerox ({
filePath: './doc.pdf' ,
enableHybridExtraction: true ,
directImageExtraction: true , // Error: incompatible
schema: mySchema
});
// ❌ Requires a schema
await zerox ({
filePath: './doc.pdf' ,
enableHybridExtraction: true // Error: schema required
});
// ✅ Correct usage
await zerox ({
filePath: './doc.pdf' ,
enableHybridExtraction: true ,
schema: mySchema // Schema is required
});
Feature Text-Only Hybrid Input OCR text only Text + images Token usage Lower Higher (2-3x) Processing time Faster Slower Accuracy for text High High Visual context None Full Best for Simple documents Complex layouts Cost Lower Higher
Real-World Example: Invoice Processing
A complete example showing the benefits of hybrid extraction:
import { zerox } from 'zerox' ;
const result = await zerox ({
filePath: './invoice-with-logo.pdf' ,
openaiAPIKey: process . env . OPENAI_API_KEY ,
enableHybridExtraction: true ,
schema: {
type: 'object' ,
properties: {
// Text extraction
invoiceNumber: { type: 'string' },
date: { type: 'string' },
total: { type: 'number' },
// Visual extraction
hasCompanyLogo: {
type: 'boolean' ,
description: 'Whether company logo is present'
},
isStamped: {
type: 'boolean' ,
description: 'Whether invoice has an approval stamp'
},
// Complex table extraction
lineItems: {
type: 'array' ,
items: {
type: 'object' ,
properties: {
description: { type: 'string' },
quantity: { type: 'number' },
unitPrice: { type: 'number' },
total: { type: 'number' }
}
}
}
}
},
extractionPrompt: `Extract invoice data. Use visual cues to identify logos, stamps, and table structure.`
});
console . log ( 'Invoice validated:' , result . extracted . hasCompanyLogo && result . extracted . isStamped );
Next Steps
Schema Extraction Learn more about defining extraction schemas
Custom Models Use custom model implementations