Prerequisites
Before installing py-zerox, ensure you have:- Python 3.11 or higher
- pip package manager
- System access to install Poppler utilities
Installation
Install system dependencies first
Install Poppler on your system:Verify installation:
- Linux (Ubuntu/Debian)
- macOS
- Windows
- Conda
See the pdf2image documentation for detailed platform-specific instructions.
Install py-zerox
Install the package using pip:
The package will automatically install all Python dependencies including
pdf2image, litellm, aiofiles, aiohttp, and others.Verification Commands
Verify that Poppler is correctly installed and available:Troubleshooting
Error: 'pdftoppm' not found or 'Unable to find pdftoppm'
Error: 'pdftoppm' not found or 'Unable to find pdftoppm'
Poppler is not installed or not in your system PATH.Solution:
- Linux:
sudo apt-get install -y poppler-utils - macOS:
brew install poppler - Windows: Download binaries and add to PATH (see installation steps)
- Conda:
conda install -c conda-forge poppler
pdftoppm -hImportError: No module named 'pyzerox'
ImportError: No module named 'pyzerox'
The package is not installed or Python canβt find it.Solution:If using a virtual environment, ensure itβs activated:
Error: Python version requirement not satisfied
Error: Python version requirement not satisfied
py-zerox requires Python 3.11 or higher.Solution:
Check your Python version:If you have Python 3.11+ installed but not as default:Consider using pyenv or conda to manage Python versions:
API key or authentication errors
API key or authentication errors
Missing or incorrect API credentials for your LLM provider.Solution:
Set the appropriate environment variables for your provider:Refer to the LiteLLM documentation for provider-specific setup.
Memory errors with large PDFs
Memory errors with large PDFs
Processing large PDFs with high concurrency can cause memory issues.Solution:
Reduce the concurrency parameter:Or process specific pages only:
SSL certificate verification errors
SSL certificate verification errors
Network issues or corporate firewalls blocking API requests.Solution:For corporate proxies, set proxy environment variables:
Async/await errors or event loop issues
Async/await errors or event loop issues
Common when mixing sync and async code incorrectly.Solution:
Always use In Jupyter notebooks:
asyncio.run() for the main entry point:Dependencies Reference
py-zerox uses the following dependencies:| Dependency | Purpose | Installation |
|---|---|---|
| poppler-utils | PDF to image conversion | System package |
| pdf2image | Python wrapper for Poppler | Installed with pip |
| litellm | Unified API for LLM providers | Installed with pip |
| aiofiles | Async file operations | Installed with pip |
| aiohttp | Async HTTP client | Installed with pip |
| aioshutil | Async file utilities | Installed with pip |
| pypdf2 | PDF metadata reading | Installed with pip |
Virtual Environment (Recommended)
Itβs recommended to use a virtual environment to avoid dependency conflicts:Next Steps
Quick Start
Start using py-zerox with your first OCR document
Configuration
Configure models, providers, and processing options

