MarkItDown uses optional dependency groups to minimize installation size and avoid unnecessary dependencies. Install only what you need for your specific use case.
Installation Groups
MarkItDown defines the following optional dependency groups in pyproject.toml:
all - Complete Installation
pip install markitdown[all]
Installs all optional dependencies for maximum file format support.
Includes:
python-pptx - PowerPoint support
mammoth~=1.11.0 - Word document support
pandas - Excel support
openpyxl - Excel (.xlsx) support
xlrd - Excel (.xls) support
lxml - Enhanced XML/HTML parsing
pdfminer.six>=20251230 - PDF text extraction
pdfplumber>=0.11.9 - Advanced PDF parsing
olefile - Outlook MSG support
pydub - Audio processing
SpeechRecognition - Audio transcription
youtube-transcript-api~=1.0.0 - YouTube transcript support
azure-ai-documentintelligence - Azure Document Intelligence
azure-identity - Azure authentication
Recommended for most users to ensure all file types are supported.
pptx - PowerPoint Support
pip install markitdown[pptx]
Includes:
Supports:
.pptx files (PowerPoint presentations)
docx - Word Document Support
pip install markitdown[docx]
Includes:
Supports:
.docx files (Word documents)
xlsx - Excel XLSX Support
pip install markitdown[xlsx]
Includes:
Supports:
.xlsx files (Excel spreadsheets, modern format)
xls - Excel XLS Support
pip install markitdown[xls]
Includes:
Supports:
.xls files (Excel spreadsheets, legacy format)
pdf - PDF Support
pip install markitdown[pdf]
Includes:
pdfminer.six>=20251230
pdfplumber>=0.11.9
Supports:
.pdf files (PDF documents)
PDF support uses both pdfminer.six and pdfplumber for robust text extraction and table parsing.
outlook - Outlook MSG Support
pip install markitdown[outlook]
Includes:
Supports:
.msg files (Outlook messages)
audio-transcription - Audio Support
pip install markitdown[audio-transcription]
Includes:
Supports:
- Audio file transcription
- Requires FFmpeg installed on system
Audio transcription requires FFmpeg to be installed separately:# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Windows
# Download from https://ffmpeg.org
youtube-transcription - YouTube Support
pip install markitdown[youtube-transcription]
Includes:
youtube-transcript-api~=1.0.0
Supports:
- YouTube video transcript extraction
az-doc-intel - Azure Document Intelligence
pip install markitdown[az-doc-intel]
Includes:
azure-ai-documentintelligence
azure-identity
Supports:
- Cloud-based document conversion via Azure Document Intelligence service
- Enhanced OCR and layout analysis
Core Dependencies
These dependencies are always installed:
beautifulsoup4 - HTML parsing
requests - HTTP requests
markdownify - HTML to Markdown conversion
magika~=0.6.1 - File type detection
charset-normalizer - Character encoding detection
defusedxml - Safe XML parsing
Combining Dependency Groups
Install multiple groups by combining them:
# PDF and Office support
pip install markitdown[pdf,docx,xlsx,pptx]
# PDF and Azure
pip install markitdown[pdf,az-doc-intel]
# Office formats only
pip install markitdown[docx,xlsx,pptx]
Use Case Examples
Minimal Installation
For basic text, HTML, CSV, and simple formats:
Supports without optional dependencies:
- Plain text files
- HTML files
- CSV files
- RSS feeds
- Jupyter notebooks (
.ipynb)
- Wikipedia pages
- Bing search results
- YouTube transcripts (if available)
Document Processing Server
For a server processing office documents and PDFs:
pip install markitdown[pdf,docx,xlsx,pptx]
Web Scraping Application
Minimal installation is sufficient:
For audio and image processing:
pip install markitdown[audio-transcription]
# Also install system dependencies
sudo apt-get install ffmpeg exiftool # Linux
brew install ffmpeg exiftool # macOS
Cloud Document Service
Using Azure Document Intelligence:
pip install markitdown[az-doc-intel]
Development Environment
Install everything for testing:
pip install markitdown[all]
Checking Installed Dependencies
Verify which dependencies are installed:
import importlib.util
dependencies = {
'pptx': 'pptx',
'docx': 'mammoth',
'xlsx': 'openpyxl',
'xls': 'xlrd',
'pdf': 'pdfplumber',
'outlook': 'olefile',
'audio': 'speech_recognition',
'youtube': 'youtube_transcript_api',
'azure': 'azure.ai.documentintelligence'
}
for name, module in dependencies.items():
spec = importlib.util.find_spec(module)
status = "✓" if spec else "✗"
print(f"{status} {name}: {module}")
Error Messages
When a required dependency is missing, MarkItDown will raise a MissingDependencyException:
from markitdown import MarkItDown, MissingDependencyException
md = MarkItDown()
try:
result = md.convert("document.pdf")
except MissingDependencyException as e:
print(e)
# Output: "Conversion requires the optional dependency [pdf] to be installed.
# E.g., `pip install markitdown[pdf]`"
Upgrade Existing Installation
Add optional dependencies to an existing installation:
# Already have markitdown installed, add PDF support
pip install markitdown[pdf]
# Upgrade to full installation
pip install markitdown[all]
Dependencies in Docker
The official Docker image includes all dependencies:
RUN pip --no-cache-dir install \
/app/packages/markitdown[all]
For a minimal Docker image, customize the Dockerfile:
# Minimal image with only PDF support
RUN pip --no-cache-dir install \
/app/packages/markitdown[pdf]
Requirements Files
For reproducible environments:
Or specify exact versions:
markitdown==0.1.0
python-pptx==0.6.21
mammoth==1.11.0
pandas==2.0.0
# ... etc
System Dependencies
Some features require system-level packages:
# macOS
brew install exiftool
# Ubuntu/Debian
sudo apt-get install libimage-exiftool-perl
# Set path if needed
export EXIFTOOL_PATH=/usr/bin/exiftool
FFmpeg (for audio processing)
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Windows
# Download from https://ffmpeg.org
Troubleshooting
Version Conflicts
If you encounter version conflicts:
# Create a fresh virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
pip install markitdown[all]
Installation Failures
Some packages require build tools:
# Ubuntu/Debian
sudo apt-get install build-essential python3-dev
# macOS (install Xcode Command Line Tools)
xcode-select --install
Check Installed Version
pip show markitdown
pip list | grep markitdown