Prerequisites
MarkItDown requires Python 3.10 or higher. Before installing, verify your Python version:Virtual environment setup
We strongly recommend using a virtual environment to avoid dependency conflicts with other Python projects.- Standard Python
- uv
- Anaconda
Create and activate a virtual environment using Python’s built-in When activated, your terminal prompt will show
venv module:(.venv) at the beginning.Basic installation
Once your virtual environment is activated, install MarkItDown.Install all features
For complete format support, install with all optional dependencies:- PDF files
- Word documents (.docx)
- PowerPoint presentations (.pptx)
- Excel spreadsheets (.xlsx, .xls)
- Outlook messages (.msg)
- Audio transcription
- YouTube transcripts
- Azure Document Intelligence
- And more
The
[all] extra provides backward-compatible behavior with MarkItDown versions prior to 0.1.0.Minimal installation
For a minimal installation with only core dependencies:- HTML
- Plain text formats (CSV, JSON, XML)
- Jupyter notebooks
- EPub files
- ZIP archives
- Basic image metadata (without OCR)
Selective installation
Starting with version 0.1.0, MarkItDown organizes dependencies into optional feature groups. Install only what you need to keep your environment lean.Available feature groups
PDF support
PDF support
pdfminer.six- Text extractionpdfplumber- Table and layout detection
Office documents
Office documents
Word documents:PowerPoint presentations:Excel spreadsheets (modern):Excel spreadsheets (legacy .xls):
Outlook messages
Outlook messages
Audio transcription
Audio transcription
pydub- Audio processingSpeechRecognition- Speech-to-text
YouTube transcripts
YouTube transcripts
Azure Document Intelligence
Azure Document Intelligence
azure-ai-documentintelligenceazure-identity
Combining feature groups
Install multiple feature groups by separating them with commas:Install from source
For development or to use the latest unreleased features, install from the GitHub repository:Install in development mode
-e flag installs in editable mode, so changes to the source code are immediately reflected.Docker installation
Run MarkItDown in a Docker container for isolated environments:Docker with volume mounting
For easier file access, mount a local directory:Verify installation
Confirm MarkItDown is installed correctly:Test with a sample conversion
Create a simple HTML file and convert it:Optional dependencies
ExifTool for image metadata
For enhanced image metadata extraction, install ExifTool:- macOS
- Linux (Debian/Ubuntu)
- Windows
LLM integration
To use LLM-powered image descriptions, install an OpenAI-compatible client:Troubleshooting
ModuleNotFoundError for a specific format
ModuleNotFoundError for a specific format
If you get an error like
MissingDependencyException when converting a file, install the specific feature group:Permission denied errors
Permission denied errors
On Unix systems, if you encounter permission errors, avoid using
sudo. Instead, ensure you’re in a virtual environment:Python version conflicts
Python version conflicts
If you have multiple Python versions installed, be explicit:
SSL certificate errors
SSL certificate errors
If you encounter SSL errors when downloading packages:
Upgrading
To upgrade to the latest version:Breaking changes in 0.1.0
If you’re upgrading from version 0.0.1, be aware of these breaking changes:Next steps
Quickstart guide
Convert your first document with MarkItDown
Python API
Explore the complete API reference
CLI reference
Learn all command-line options
Plugins
Extend MarkItDown with custom converters