Optional Dependencies

MarkItDown uses optional dependency groups to minimize installation size and avoid unnecessary dependencies. Install only what you need for your specific use case.

Installation Groups

MarkItDown defines the following optional dependency groups in pyproject.toml:

all - Complete Installation

pip install markitdown[all]

Installs all optional dependencies for maximum file format support. Includes:

python-pptx - PowerPoint support
mammoth~=1.11.0 - Word document support
pandas - Excel support
openpyxl - Excel (.xlsx) support
xlrd - Excel (.xls) support
lxml - Enhanced XML/HTML parsing
pdfminer.six>=20251230 - PDF text extraction
pdfplumber>=0.11.9 - Advanced PDF parsing
olefile - Outlook MSG support
pydub - Audio processing
SpeechRecognition - Audio transcription
youtube-transcript-api~=1.0.0 - YouTube transcript support
azure-ai-documentintelligence - Azure Document Intelligence
azure-identity - Azure authentication

Recommended for most users to ensure all file types are supported.

pptx - PowerPoint Support

pip install markitdown[pptx]

Includes:

python-pptx

Supports:

.pptx files (PowerPoint presentations)

docx - Word Document Support

pip install markitdown[docx]

Includes:

mammoth~=1.11.0
lxml

Supports:

.docx files (Word documents)

xlsx - Excel XLSX Support

pip install markitdown[xlsx]

Includes:

pandas
openpyxl

Supports:

.xlsx files (Excel spreadsheets, modern format)

xls - Excel XLS Support

pip install markitdown[xls]

Includes:

pandas
xlrd

Supports:

.xls files (Excel spreadsheets, legacy format)

pdf - PDF Support

pip install markitdown[pdf]

Includes:

pdfminer.six>=20251230
pdfplumber>=0.11.9

Supports:

.pdf files (PDF documents)

PDF support uses both pdfminer.six and pdfplumber for robust text extraction and table parsing.

outlook - Outlook MSG Support

pip install markitdown[outlook]

Includes:

olefile

Supports:

.msg files (Outlook messages)

audio-transcription - Audio Support

pip install markitdown[audio-transcription]

Includes:

pydub
SpeechRecognition

Supports:

Audio file transcription
Requires FFmpeg installed on system

Audio transcription requires FFmpeg to be installed separately:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org

youtube-transcription - YouTube Support

pip install markitdown[youtube-transcription]

Includes:

youtube-transcript-api~=1.0.0

Supports:

YouTube video transcript extraction

az-doc-intel - Azure Document Intelligence

pip install markitdown[az-doc-intel]

Includes:

azure-ai-documentintelligence
azure-identity

Supports:

Cloud-based document conversion via Azure Document Intelligence service
Enhanced OCR and layout analysis

Requires an Azure Document Intelligence endpoint. See the Azure Document Intelligence guide for setup.

Core Dependencies

These dependencies are always installed:

beautifulsoup4 - HTML parsing
requests - HTTP requests
markdownify - HTML to Markdown conversion
magika~=0.6.1 - File type detection
charset-normalizer - Character encoding detection
defusedxml - Safe XML parsing

Combining Dependency Groups

Install multiple groups by combining them:

# PDF and Office support
pip install markitdown[pdf,docx,xlsx,pptx]

# PDF and Azure
pip install markitdown[pdf,az-doc-intel]

# Office formats only
pip install markitdown[docx,xlsx,pptx]

Use Case Examples

Minimal Installation

For basic text, HTML, CSV, and simple formats:

pip install markitdown

Supports without optional dependencies:

Plain text files
HTML files
CSV files
RSS feeds
Jupyter notebooks (.ipynb)
Wikipedia pages
Bing search results
YouTube transcripts (if available)

Document Processing Server

For a server processing office documents and PDFs:

pip install markitdown[pdf,docx,xlsx,pptx]

Web Scraping Application

Minimal installation is sufficient:

pip install markitdown

Media Processing Pipeline

For audio and image processing:

pip install markitdown[audio-transcription]

# Also install system dependencies
sudo apt-get install ffmpeg exiftool  # Linux
brew install ffmpeg exiftool          # macOS

Cloud Document Service

Using Azure Document Intelligence:

pip install markitdown[az-doc-intel]

Development Environment

Install everything for testing:

pip install markitdown[all]

Checking Installed Dependencies

Verify which dependencies are installed:

import importlib.util

dependencies = {
    'pptx': 'pptx',
    'docx': 'mammoth',
    'xlsx': 'openpyxl',
    'xls': 'xlrd',
    'pdf': 'pdfplumber',
    'outlook': 'olefile',
    'audio': 'speech_recognition',
    'youtube': 'youtube_transcript_api',
    'azure': 'azure.ai.documentintelligence'
}

for name, module in dependencies.items():
    spec = importlib.util.find_spec(module)
    status = "✓" if spec else "✗"
    print(f"{status} {name}: {module}")

Error Messages

When a required dependency is missing, MarkItDown will raise a MissingDependencyException:

from markitdown import MarkItDown, MissingDependencyException

md = MarkItDown()

try:
    result = md.convert("document.pdf")
except MissingDependencyException as e:
    print(e)
    # Output: "Conversion requires the optional dependency [pdf] to be installed.
    #          E.g., `pip install markitdown[pdf]`"

Upgrade Existing Installation

Add optional dependencies to an existing installation:

# Already have markitdown installed, add PDF support
pip install markitdown[pdf]

# Upgrade to full installation
pip install markitdown[all]

Dependencies in Docker

The official Docker image includes all dependencies:

RUN pip --no-cache-dir install \
    /app/packages/markitdown[all]

For a minimal Docker image, customize the Dockerfile:

# Minimal image with only PDF support
RUN pip --no-cache-dir install \
    /app/packages/markitdown[pdf]

Requirements Files

For reproducible environments:

requirements.txt

markitdown[all]==0.1.0

Or specify exact versions:

requirements.txt

markitdown==0.1.0
python-pptx==0.6.21
mammoth==1.11.0
pandas==2.0.0
# ... etc

System Dependencies

Some features require system-level packages:

ExifTool (for image metadata)

# macOS
brew install exiftool

# Ubuntu/Debian
sudo apt-get install libimage-exiftool-perl

# Set path if needed
export EXIFTOOL_PATH=/usr/bin/exiftool

FFmpeg (for audio processing)

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org

Troubleshooting

Version Conflicts

If you encounter version conflicts:

# Create a fresh virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows

pip install markitdown[all]

Installation Failures

Some packages require build tools:

# Ubuntu/Debian
sudo apt-get install build-essential python3-dev

# macOS (install Xcode Command Line Tools)
xcode-select --install

Check Installed Version

pip show markitdown
pip list | grep markitdown

Get Started

Guides

File Formats

Advanced

Optional Dependencies

Installation Groups

all - Complete Installation

pptx - PowerPoint Support

docx - Word Document Support

xlsx - Excel XLSX Support

xls - Excel XLS Support

pdf - PDF Support

outlook - Outlook MSG Support

audio-transcription - Audio Support

youtube-transcription - YouTube Support

az-doc-intel - Azure Document Intelligence

Core Dependencies

Combining Dependency Groups

Use Case Examples

Minimal Installation

Document Processing Server

Web Scraping Application

Media Processing Pipeline

Cloud Document Service

Development Environment

Checking Installed Dependencies

Error Messages

Upgrade Existing Installation

Dependencies in Docker

Requirements Files

System Dependencies

ExifTool (for image metadata)

FFmpeg (for audio processing)

Troubleshooting

Version Conflicts

Installation Failures

Check Installed Version

Build docs developers (and LLMs) love

Get Started

Guides

File Formats

Advanced

​Installation Groups

​all - Complete Installation

​pptx - PowerPoint Support

​docx - Word Document Support

​xlsx - Excel XLSX Support

​xls - Excel XLS Support

​pdf - PDF Support

​outlook - Outlook MSG Support

​audio-transcription - Audio Support

​youtube-transcription - YouTube Support

​az-doc-intel - Azure Document Intelligence

​Core Dependencies

​Combining Dependency Groups

​Use Case Examples

​Minimal Installation

​Document Processing Server

​Web Scraping Application

​Media Processing Pipeline

​Cloud Document Service

​Development Environment

​Checking Installed Dependencies

​Error Messages

​Upgrade Existing Installation

​Dependencies in Docker

​Requirements Files

​System Dependencies

​ExifTool (for image metadata)

​FFmpeg (for audio processing)

​Troubleshooting

​Version Conflicts

​Installation Failures

​Check Installed Version

Build docs developers (and LLMs) love

Installation Groups

all - Complete Installation

pptx - PowerPoint Support

docx - Word Document Support

xlsx - Excel XLSX Support

xls - Excel XLS Support

pdf - PDF Support

outlook - Outlook MSG Support

audio-transcription - Audio Support

youtube-transcription - YouTube Support

az-doc-intel - Azure Document Intelligence

Core Dependencies

Combining Dependency Groups

Use Case Examples

Minimal Installation

Document Processing Server

Web Scraping Application

Media Processing Pipeline

Cloud Document Service

Development Environment

Checking Installed Dependencies

Error Messages

Upgrade Existing Installation

Dependencies in Docker

Requirements Files

System Dependencies

ExifTool (for image metadata)

FFmpeg (for audio processing)

Troubleshooting

Version Conflicts

Installation Failures

Check Installed Version