Skip to main content

Required Parameters

file_path
str
required
The path or URL to the document file to process. Supports:
  • Local file paths (e.g., "./document.pdf")
  • HTTP/HTTPS URLs (e.g., "https://example.com/doc.pdf")
  • Supported formats: PDF, DOCX, images, and more
result = await zerox(file_path="https://example.com/document.pdf")

Model Configuration

model
str
default:"gpt-4o-mini"
The vision model to use for OCR processing. The model name format depends on the provider.Refer to LiteLLM Providers for correct model names.Examples:
  • OpenAI: "gpt-4o-mini", "gpt-4o"
  • Azure: "azure/gpt-4o-mini" (format: azure/<deployment_name>)
  • Gemini: "gemini/gemini-1.5-flash" (format: gemini/<model_name>)
  • Anthropic: "claude-3-opus-20240229"
  • Bedrock: "bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
  • Vertex AI: "vertex_ai/gemini-1.5-flash-001"
result = await zerox(
    file_path="document.pdf",
    model="gpt-4o"  # Use GPT-4 Vision
)
custom_system_prompt
str
default:"None"
Override the default system prompt used for OCR processing.
This parameter is unique to the Python SDK. The Node.js SDK does not support custom system prompts.
When set, a warning will be raised to inform you that the default prompt has been overridden.Default prompt: Instructs the model to convert documents to markdown, include all information, format tables as HTML, and use specific markers for logos, watermarks, and page numbers.
custom_prompt = """
Convert this document to markdown.
Focus on extracting only tables and numerical data.
Return only markdown with no explanations.
"""

result = await zerox(
    file_path="document.pdf",
    custom_system_prompt=custom_prompt
)

Processing Options

concurrency
int
default:"10"
The number of pages to process concurrently. Higher values speed up processing but increase memory and API usage.Set to 1 for sequential processing (useful when combined with maintain_format).
result = await zerox(
    file_path="large-document.pdf",
    concurrency=5  # Process 5 pages at a time
)
maintain_format
bool
default:"False"
Whether to maintain consistent formatting across pages by passing the previous page’s output as context to the next page.When enabled:
  • Pages are processed sequentially (slower)
  • Each page receives the previous page’s markdown as context
  • Useful for documents with tables spanning multiple pages
  • Helps maintain consistent table formatting
Process flow:
Request #1 => page_1_image
Request #2 => page_1_markdown + page_2_image
Request #3 => page_2_markdown + page_3_image
result = await zerox(
    file_path="document.pdf",
    maintain_format=True  # Better for tabular data
)
If both maintain_format and select_pages are set, a warning will be raised as this combination may not produce expected results.
select_pages
Union[int, Iterable[int]]
default:"None"
Process only specific pages from the document. Can be:
  • A single page number (int): 3
  • A list of page numbers: [1, 3, 5]
  • Any iterable of page numbers
Page numbers are 1-indexed. When None, all pages are processed.
# Process only page 1
result = await zerox(
    file_path="document.pdf",
    select_pages=1
)

# Process pages 1, 3, and 5
result = await zerox(
    file_path="document.pdf",
    select_pages=[1, 3, 5]
)

Image Conversion Options

image_density
int
default:"300"
DPI (dots per inch) for converting PDF pages to images. Higher values produce better quality but larger file sizes.Recommended values:
  • 150: Fast, lower quality
  • 300: Good balance (default)
  • 600: High quality, slower
result = await zerox(
    file_path="document.pdf",
    image_density=300  # Standard quality
)
image_height
tuple[Optional[int], int]
default:"(None, 1056)"
Maximum height for converted images as a tuple (width, height). The first element (width) is typically None to maintain aspect ratio.Images larger than this height will be scaled down while preserving aspect ratio.
result = await zerox(
    file_path="document.pdf",
    image_height=(None, 2048)  # Larger images for better quality
)

File Management

output_dir
str
default:"None"
Directory to save the aggregated markdown output as a .md file.When set:
  • The directory will be created if it doesn’t exist
  • Output file will be named {file_name}.md
  • File name is sanitized (special characters replaced with underscores)
When None, no file is written (markdown is only returned in the response).
result = await zerox(
    file_path="document.pdf",
    output_dir="./output"  # Saves to ./output/document.md
)
temp_dir
str
default:"None"
Directory for storing temporary files during processing (downloaded PDFs, converted images, etc.).Behavior:
  • When None: Uses system temp directory (automatically cleaned up)
  • When set: Uses specified directory
    • If directory exists, contents are deleted before use
    • If cleanup=True, directory is deleted after processing
    • If cleanup=False, files remain for inspection
result = await zerox(
    file_path="document.pdf",
    temp_dir="./temp",
    cleanup=False  # Keep temp files for debugging
)
cleanup
bool
default:"True"
Whether to delete temporary files after processing completes.Set to False to keep temporary files for debugging or inspection.
result = await zerox(
    file_path="document.pdf",
    cleanup=False  # Keep temporary images
)

Additional LiteLLM Parameters

**kwargs
dict
Additional keyword arguments passed directly to the litellm.completion() method.Commonly used parameters:
  • temperature: Controls randomness (0.0 to 1.0)
  • max_tokens: Maximum tokens in response
  • top_p: Nucleus sampling parameter
  • frequency_penalty: Penalize frequent tokens
  • presence_penalty: Penalize present tokens
  • Provider-specific credentials (e.g., vertex_credentials)
Refer to:
result = await zerox(
    file_path="document.pdf",
    model="gpt-4o-mini",
    temperature=0.1,        # More deterministic
    max_tokens=4096,        # Limit response length
    top_p=0.95             # Nucleus sampling
)
Vertex AI Example:
import json

with open('service_account.json', 'r') as file:
    vertex_credentials = json.load(file)

result = await zerox(
    file_path="document.pdf",
    model="vertex_ai/gemini-1.5-flash-001",
    vertex_credentials=json.dumps(vertex_credentials)
)

Parameter Comparison: Python vs Node.js

Key differences between Python and Node.js SDKs:
FeaturePython ParameterNode.js ParameterNotes
Custom promptscustom_system_promptNot availablePython only
Page selectionselect_pagespagesToConvertAsImagesDifferent naming
Format maintenancemaintain_formatmaintainFormatSnake case vs camel case
Temp directorytemp_dirtempDirSnake case vs camel case
Data extractionNot availableschema, extractPerPageNode.js only
Error handlingNot availableerrorModeNode.js only
Orientation fixNot availablecorrectOrientationNode.js only
Edge trimmingNot availabletrimEdgesNode.js only

Complete Example

import asyncio
import os
from pyzerox import zerox

os.environ["OPENAI_API_KEY"] = "your-api-key"

async def main():
    result = await zerox(
        # Required
        file_path="https://example.com/document.pdf",
        
        # Model configuration
        model="gpt-4o-mini",
        custom_system_prompt=None,
        
        # Processing options
        concurrency=10,
        maintain_format=False,
        select_pages=None,
        
        # Image conversion
        image_density=300,
        image_height=(None, 1056),
        
        # File management
        output_dir="./output",
        temp_dir=None,
        cleanup=True,
        
        # Additional LiteLLM parameters
        temperature=0.1,
        max_tokens=4096
    )
    
    print(f"Processed {len(result.pages)} pages")
    print(f"Total tokens: {result.input_tokens + result.output_tokens}")
    return result

result = asyncio.run(main())

Build docs developers (and LLMs) love