Hybrid Extraction

Overview

Hybrid extraction combines vision-based processing with OCR text to provide the best of both worlds: the accuracy of text extraction with the context of visual information. This is particularly useful for complex documents with tables, charts, or mixed content.

How Hybrid Extraction Works

When enableHybridExtraction is enabled, Zerox:

First performs OCR to extract text from each page
Then sends both the images and OCR text to the extraction model
The model uses the text for accuracy and the images for visual context

Hybrid extraction requires a schema and cannot be used with extractOnly or directImageExtraction modes.

Basic Hybrid Extraction

import { zerox } from 'zerox';

const result = await zerox({
  filePath: './financial-report.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      revenue: {
        type: 'number',
        description: 'Total revenue from the financial table'
      },
      expenses: {
        type: 'number',
        description: 'Total expenses'
      },
      chartTitle: {
        type: 'string',
        description: 'Title of the chart or graph'
      },
      trends: {
        type: 'array',
        description: 'Key trends visible in charts',
        items: { type: 'string' }
      }
    }
  }
});

console.log(result.extracted);
// {
//   revenue: 1250000,
//   expenses: 890000,
//   chartTitle: 'Quarterly Revenue Growth',
//   trends: ['Upward trend in Q3', 'Peak in December']
// }

// You also get the OCR text
console.log(result.pages[0].content);

When to Use Hybrid Extraction

Hybrid extraction is ideal for documents with:

Complex Tables

Tables with merged cells, multiple headers, or nested structures benefit from both visual layout and text content:

const result = await zerox({
  filePath: './complex-table.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      quarterlyData: {
        type: 'array',
        description: 'Data from the quarterly comparison table',
        items: {
          type: 'object',
          properties: {
            quarter: { type: 'string' },
            region: { type: 'string' },
            sales: { type: 'number' },
            growth: { type: 'string' }
          }
        }
      }
    }
  }
});

Charts and Graphs

Visual data representations that include both text labels and graphical elements:

const result = await zerox({
  filePath: './presentation.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      chartType: {
        type: 'string',
        description: 'Type of chart (bar, line, pie, etc.)'
      },
      dataPoints: {
        type: 'array',
        description: 'Values from the chart',
        items: {
          type: 'object',
          properties: {
            label: { type: 'string' },
            value: { type: 'number' }
          }
        }
      },
      visualInsights: {
        type: 'array',
        description: 'Observations from the visual representation',
        items: { type: 'string' }
      }
    }
  }
});

Forms with Visual Elements

Forms containing checkboxes, signatures, stamps, or highlighted sections:

const result = await zerox({
  filePath: './application-form.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      applicantName: { type: 'string' },
      selectedOptions: {
        type: 'array',
        description: 'Checked boxes in the form',
        items: { type: 'string' }
      },
      hasSignature: {
        type: 'boolean',
        description: 'Whether the form is signed'
      },
      highlightedSections: {
        type: 'array',
        description: 'Sections that are highlighted or marked',
        items: { type: 'string' }
      }
    }
  }
});

Combining with extractPerPage

Hybrid extraction works seamlessly with per-page extraction:

import { zerox } from 'zerox';

const result = await zerox({
  filePath: './multi-page-report.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      // Document-level: extracted from all pages
      reportTitle: { type: 'string' },
      author: { type: 'string' },
      // Page-level: extracted from each page
      charts: {
        type: 'array',
        description: 'Charts found on this page',
        items: {
          type: 'object',
          properties: {
            title: { type: 'string' },
            type: { type: 'string' },
            keyFinding: { type: 'string' }
          }
        }
      }
    }
  },
  extractPerPage: ['charts']
});

// Result includes page numbers for per-page extractions
console.log(result.extracted);
// {
//   reportTitle: 'Annual Analysis 2024',
//   author: 'Data Team',
//   charts: [
//     { page: 1, value: [{ title: 'Revenue', type: 'bar', ... }] },
//     { page: 2, value: [{ title: 'Growth', type: 'line', ... }] }
//   ]
// }

Performance Considerations

Token Usage

Hybrid extraction uses more tokens because it processes both images and text:

const result = await zerox({
  filePath: './document.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: mySchema
});

console.log(`Input tokens: ${result.inputTokens}`);
console.log(`Output tokens: ${result.outputTokens}`);

// Hybrid extraction typically uses 2-3x more input tokens
// than text-only extraction due to image processing

Hybrid extraction increases token usage and cost compared to text-only extraction. Use it when visual context significantly improves accuracy.

Processing Time

Hybrid extraction requires both OCR and extraction steps:

const start = Date.now();

const result = await zerox({
  filePath: './document.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: mySchema,
  concurrency: 5  // Process multiple pages in parallel
});

console.log(`Completed in ${result.completionTime}ms`);

Custom Extraction Prompts

Guide the model to leverage both visual and textual information:

const result = await zerox({
  filePath: './infographic.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      mainStatistics: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            label: { type: 'string' },
            value: { type: 'string' },
            visualEmphasis: {
              type: 'string',
              description: 'How this stat is visually emphasized (color, size, icons, etc.)'
            }
          }
        }
      }
    }
  },
  extractionPrompt: `Extract statistics from this infographic. 
    Use the text for accurate values and the visual elements to understand emphasis and hierarchy. 
    Note any visual indicators like colors, icons, or size differences that highlight important data.`
});

Limitations

Cannot Be Used With

Hybrid extraction has some restrictions:

// ❌ Cannot use with extractOnly
await zerox({
  filePath: './doc.pdf',
  enableHybridExtraction: true,
  extractOnly: true,  // Error: incompatible
  schema: mySchema
});

// ❌ Cannot use with directImageExtraction
await zerox({
  filePath: './doc.pdf',
  enableHybridExtraction: true,
  directImageExtraction: true,  // Error: incompatible
  schema: mySchema
});

// ❌ Requires a schema
await zerox({
  filePath: './doc.pdf',
  enableHybridExtraction: true  // Error: schema required
});

// ✅ Correct usage
await zerox({
  filePath: './doc.pdf',
  enableHybridExtraction: true,
  schema: mySchema  // Schema is required
});

Comparison: Text vs. Hybrid Extraction

Feature	Text-Only	Hybrid
Input	OCR text only	Text + images
Token usage	Lower	Higher (2-3x)
Processing time	Faster	Slower
Accuracy for text	High	High
Visual context	None	Full
Best for	Simple documents	Complex layouts
Cost	Lower	Higher

Real-World Example: Invoice Processing

A complete example showing the benefits of hybrid extraction:

import { zerox } from 'zerox';

const result = await zerox({
  filePath: './invoice-with-logo.pdf',
  openaiAPIKey: process.env.OPENAI_API_KEY,
  enableHybridExtraction: true,
  schema: {
    type: 'object',
    properties: {
      // Text extraction
      invoiceNumber: { type: 'string' },
      date: { type: 'string' },
      total: { type: 'number' },
      // Visual extraction
      hasCompanyLogo: {
        type: 'boolean',
        description: 'Whether company logo is present'
      },
      isStamped: {
        type: 'boolean',
        description: 'Whether invoice has an approval stamp'
      },
      // Complex table extraction
      lineItems: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            description: { type: 'string' },
            quantity: { type: 'number' },
            unitPrice: { type: 'number' },
            total: { type: 'number' }
          }
        }
      }
    }
  },
  extractionPrompt: `Extract invoice data. Use visual cues to identify logos, stamps, and table structure.`
});

console.log('Invoice validated:', result.extracted.hasCompanyLogo && result.extracted.isStamped);

Common Use Cases

Advanced

Hybrid Extraction

Overview

How Hybrid Extraction Works

Basic Hybrid Extraction

When to Use Hybrid Extraction

Complex Tables

Charts and Graphs

Forms with Visual Elements

Combining with extractPerPage

Performance Considerations

Token Usage

Processing Time

Custom Extraction Prompts

Limitations

Cannot Be Used With

Comparison: Text vs. Hybrid Extraction

Real-World Example: Invoice Processing

Next Steps

Schema Extraction

Custom Models

Build docs developers (and LLMs) love

Common Use Cases

Advanced

​Overview

​How Hybrid Extraction Works

​Basic Hybrid Extraction

​When to Use Hybrid Extraction

​Complex Tables

​Charts and Graphs

​Forms with Visual Elements

​Combining with extractPerPage

​Performance Considerations

​Token Usage

​Processing Time

​Custom Extraction Prompts

​Limitations

​Cannot Be Used With

​Comparison: Text vs. Hybrid Extraction

​Real-World Example: Invoice Processing

​Next Steps

Schema Extraction

Custom Models

Build docs developers (and LLMs) love

Overview

How Hybrid Extraction Works

Basic Hybrid Extraction

When to Use Hybrid Extraction

Complex Tables

Charts and Graphs

Forms with Visual Elements

Combining with extractPerPage

Performance Considerations

Token Usage

Processing Time

Custom Extraction Prompts

Limitations

Cannot Be Used With

Comparison: Text vs. Hybrid Extraction

Real-World Example: Invoice Processing

Next Steps