Response Type
The zerox() function returns a ZeroxOutput object containing the processed markdown content and metadata.
from pyzerox import zerox
result = await zerox(file_path="document.pdf")
# result is a ZeroxOutput object
ZeroxOutput Structure
@dataclass
class ZeroxOutput:
completion_time: float
file_name: str
input_tokens: int
output_tokens: int
pages: List[Page]
Response Fields
Total processing time in milliseconds from start to finish.Includes time for:
- File download (if URL)
- PDF to image conversion
- API calls to vision model
- Markdown aggregation
result = await zerox(file_path="document.pdf")
print(f"Completed in {result.completion_time:.2f}ms")
# Output: Completed in 9432.98ms
Sanitized name of the processed file (without extension).Sanitization rules:
- Special characters replaced with underscores
- Converted to lowercase
- Alphanumeric characters preserved
- Truncated to 255 characters
Example:
- Input:
"My Document (Final).pdf"
- Output:
"my_document__final_"
result = await zerox(file_path="/path/to/Invoice #12345.pdf")
print(result.file_name)
# Output: invoice__12345
Total number of input tokens consumed across all API calls.Input tokens include:
- System prompt
- User message (image + instructions)
- Previous page context (if
maintain_format=True)
result = await zerox(file_path="document.pdf")
print(f"Input tokens: {result.input_tokens}")
# Output: Input tokens: 36877
Total number of output tokens generated across all API calls.Output tokens represent the markdown content generated by the model.result = await zerox(file_path="document.pdf")
print(f"Output tokens: {result.output_tokens}")
total_tokens = result.input_tokens + result.output_tokens
print(f"Total tokens: {total_tokens}")
List of Page objects containing the markdown content for each processed page.Pages are ordered sequentially. When using select_pages, only the selected pages are included with their original page numbers preserved.See Page Structure below for details.result = await zerox(file_path="document.pdf")
# Iterate through pages
for page in result.pages:
print(f"Page {page.page}: {page.content_length} characters")
# Access specific page
first_page = result.pages[0]
print(first_page.content)
Page Structure
Each page in the pages list is a Page object with the following structure:
@dataclass
class Page:
content: str
content_length: int
page: int
Page Fields
The markdown content extracted from the page.Content includes:
- Headers and body text
- Tables (formatted as HTML)
- Lists and formatting
- Special markers for logos, watermarks, page numbers
result = await zerox(file_path="document.pdf")
page = result.pages[0]
print(page.content)
# Output:
# # INVOICE # 36258
# **Date:** Mar 06 2012
# ...
The length of the markdown content in characters.Equivalent to len(content).result = await zerox(file_path="document.pdf")
page = result.pages[0]
print(f"Content length: {page.content_length} characters")
# Output: Content length: 2333 characters
The page number (1-indexed) from the original document.When using select_pages, this reflects the original page number, not the sequential index.# Process only pages 2, 5, and 10
result = await zerox(
file_path="document.pdf",
select_pages=[2, 5, 10]
)
# Pages preserve original numbering
print(result.pages[0].page) # Output: 2
print(result.pages[1].page) # Output: 5
print(result.pages[2].page) # Output: 10
Example Output
This example shows output from processing a single-page invoice PDF using Azure OpenAI’s gpt-4o-mini model.
ZeroxOutput(
completion_time=9432.975,
file_name='cs101',
input_tokens=36877,
output_tokens=515,
pages=[
Page(
content='| Type | Description | Wrapper Class |\n' +
'|---------|--------------------------------------|---------------|\n' +
'| byte | 8-bit signed 2s complement integer | Byte |\n' +
'| short | 16-bit signed 2s complement integer | Short |\n' +
'| int | 32-bit signed 2s complement integer | Integer |\n' +
'| long | 64-bit signed 2s complement integer | Long |\n' +
'| float | 32-bit IEEE 754 floating point number| Float |\n' +
'| double | 64-bit floating point number | Double |\n' +
'| boolean | may be set to true or false | Boolean |\n' +
'| char | 16-bit Unicode (UTF-16) character | Character |\n\n' +
'Table 26.2.: Primitive types in Java\n\n' +
'### 26.3.1. Declaration & Assignment\n\n' +
'Java is a statically typed language meaning that all variables must be declared before you can use ' +
'them or refer to them. In addition, when declaring a variable, you must specify both its type and ' +
'its identifier. For example:\n\n' +
'```java\n' +
'int numUnits;\n' +
'double costPerUnit;\n' +
'char firstInitial;\n' +
'boolean isStudent;\n' +
'```\n\n' +
'Each declaration specifies the variable\'s type followed by the identifier and ending with a ' +
'semicolon. The identifier rules are fairly standard: a name can consist of lowercase and ' +
'uppercase alphabetic characters, numbers, and underscores but may not begin with a numeric ' +
'character. We adopt the modern camelCasing naming convention for variables in our code. In ' +
'general, variables must be assigned a value before you can use them in an expression. You do not ' +
'have to immediately assign a value when you declare them (though it is good practice), but some ' +
'value must be assigned before they can be used or the compiler will issue an error.\n\n' +
'The assignment operator is a single equal sign, `=` and is a right-to-left assignment. That is, ' +
'the variable that we wish to assign the value to appears on the left-hand-side while the value ' +
'(literal, variable or expression) is on the right-hand-side. Using our variables from before, ' +
'we can assign them values:\n\n' +
'> 2 Instance variables, that is variables declared as part of an object do have default values. ' +
'For objects, the default is `null`, for all numeric types, zero is the default value. For the ' +
'boolean type, `false` is the default, and the default char value is `\\0`, the null-terminating ' +
'character (zero in the ASCII table).',
content_length=2333,
page=1
)
]
)
Working with the Response
Accessing Page Content
result = await zerox(file_path="document.pdf")
# Get all page content as a single string
full_markdown = "\n\n".join(page.content for page in result.pages)
# Process each page individually
for page in result.pages:
print(f"\n--- Page {page.page} ---")
print(page.content[:200]) # Print first 200 characters
print(f"Total length: {page.content_length} characters")
Calculating Token Costs
result = await zerox(file_path="document.pdf")
# Example pricing for gpt-4o-mini (as of 2024)
INPUT_COST_PER_1K = 0.00015 # $0.15 per 1M tokens
OUTPUT_COST_PER_1K = 0.0006 # $0.60 per 1M tokens
input_cost = (result.input_tokens / 1000) * INPUT_COST_PER_1K
output_cost = (result.output_tokens / 1000) * OUTPUT_COST_PER_1K
total_cost = input_cost + output_cost
print(f"Input tokens: {result.input_tokens:,} (${input_cost:.4f})")
print(f"Output tokens: {result.output_tokens:,} (${output_cost:.4f})")
print(f"Total cost: ${total_cost:.4f}")
Processing Time Analysis
result = await zerox(
file_path="large-document.pdf",
concurrency=10
)
time_in_seconds = result.completion_time / 1000
time_per_page = result.completion_time / len(result.pages)
print(f"Total time: {time_in_seconds:.2f}s")
print(f"Pages processed: {len(result.pages)}")
print(f"Average time per page: {time_per_page:.0f}ms")
Filtering Pages by Content
result = await zerox(file_path="document.pdf")
# Find pages containing specific keywords
search_term = "invoice"
matching_pages = [
page for page in result.pages
if search_term.lower() in page.content.lower()
]
print(f"Found '{search_term}' on {len(matching_pages)} pages:")
for page in matching_pages:
print(f" - Page {page.page}")
Saving Individual Pages
import aiofiles
import os
result = await zerox(file_path="document.pdf")
# Save each page as a separate markdown file
output_dir = "./output/pages"
os.makedirs(output_dir, exist_ok=True)
for page in result.pages:
file_path = os.path.join(output_dir, f"page_{page.page}.md")
async with aiofiles.open(file_path, "w") as f:
await f.write(page.content)
print(f"Saved {len(result.pages)} pages to {output_dir}")
Response Comparison: Python vs Node.js
Key differences between Python and Node.js response structures:
| Field | Python | Node.js | Notes |
|---|
| Completion time | completion_time | completionTime | Snake case vs camel case |
| File name | file_name | fileName | Snake case vs camel case |
| Input tokens | input_tokens | inputTokens | Snake case vs camel case |
| Output tokens | output_tokens | outputTokens | Snake case vs camel case |
| Pages | pages | pages | Same |
| Page content | content | content | Same |
| Content length | content_length | contentLength | Snake case vs camel case |
| Extracted data | Not available | extracted | Node.js only |
| Summary | Not available | summary | Node.js only |
Node.js exclusive fields:
extracted: Structured data extraction results (when using schema parameter)
summary: Processing summary with success/failure counts
Type Annotations
For type hints in your Python code:
from typing import List
from pyzerox import zerox
from pyzerox.core.types import ZeroxOutput, Page
async def process_document(file_path: str) -> ZeroxOutput:
result: ZeroxOutput = await zerox(file_path=file_path)
return result
def analyze_pages(pages: List[Page]) -> None:
for page in pages:
print(f"Page {page.page}: {page.content_length} chars")
# Usage
result = await process_document("document.pdf")
analyze_pages(result.pages)