Skip to main content
MarkItDown uses optional dependency groups to minimize installation size and avoid unnecessary dependencies. Install only what you need for your specific use case.

Installation Groups

MarkItDown defines the following optional dependency groups in pyproject.toml:

all - Complete Installation

pip install markitdown[all]
Installs all optional dependencies for maximum file format support. Includes:
  • python-pptx - PowerPoint support
  • mammoth~=1.11.0 - Word document support
  • pandas - Excel support
  • openpyxl - Excel (.xlsx) support
  • xlrd - Excel (.xls) support
  • lxml - Enhanced XML/HTML parsing
  • pdfminer.six>=20251230 - PDF text extraction
  • pdfplumber>=0.11.9 - Advanced PDF parsing
  • olefile - Outlook MSG support
  • pydub - Audio processing
  • SpeechRecognition - Audio transcription
  • youtube-transcript-api~=1.0.0 - YouTube transcript support
  • azure-ai-documentintelligence - Azure Document Intelligence
  • azure-identity - Azure authentication
Recommended for most users to ensure all file types are supported.

pptx - PowerPoint Support

pip install markitdown[pptx]
Includes:
  • python-pptx
Supports:
  • .pptx files (PowerPoint presentations)

docx - Word Document Support

pip install markitdown[docx]
Includes:
  • mammoth~=1.11.0
  • lxml
Supports:
  • .docx files (Word documents)

xlsx - Excel XLSX Support

pip install markitdown[xlsx]
Includes:
  • pandas
  • openpyxl
Supports:
  • .xlsx files (Excel spreadsheets, modern format)

xls - Excel XLS Support

pip install markitdown[xls]
Includes:
  • pandas
  • xlrd
Supports:
  • .xls files (Excel spreadsheets, legacy format)

pdf - PDF Support

pip install markitdown[pdf]
Includes:
  • pdfminer.six>=20251230
  • pdfplumber>=0.11.9
Supports:
  • .pdf files (PDF documents)
PDF support uses both pdfminer.six and pdfplumber for robust text extraction and table parsing.

outlook - Outlook MSG Support

pip install markitdown[outlook]
Includes:
  • olefile
Supports:
  • .msg files (Outlook messages)

audio-transcription - Audio Support

pip install markitdown[audio-transcription]
Includes:
  • pydub
  • SpeechRecognition
Supports:
  • Audio file transcription
  • Requires FFmpeg installed on system
Audio transcription requires FFmpeg to be installed separately:
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org

youtube-transcription - YouTube Support

pip install markitdown[youtube-transcription]
Includes:
  • youtube-transcript-api~=1.0.0
Supports:
  • YouTube video transcript extraction

az-doc-intel - Azure Document Intelligence

pip install markitdown[az-doc-intel]
Includes:
  • azure-ai-documentintelligence
  • azure-identity
Supports:
  • Cloud-based document conversion via Azure Document Intelligence service
  • Enhanced OCR and layout analysis
Requires an Azure Document Intelligence endpoint. See the Azure Document Intelligence guide for setup.

Core Dependencies

These dependencies are always installed:
  • beautifulsoup4 - HTML parsing
  • requests - HTTP requests
  • markdownify - HTML to Markdown conversion
  • magika~=0.6.1 - File type detection
  • charset-normalizer - Character encoding detection
  • defusedxml - Safe XML parsing

Combining Dependency Groups

Install multiple groups by combining them:
# PDF and Office support
pip install markitdown[pdf,docx,xlsx,pptx]

# PDF and Azure
pip install markitdown[pdf,az-doc-intel]

# Office formats only
pip install markitdown[docx,xlsx,pptx]

Use Case Examples

Minimal Installation

For basic text, HTML, CSV, and simple formats:
pip install markitdown
Supports without optional dependencies:
  • Plain text files
  • HTML files
  • CSV files
  • RSS feeds
  • Jupyter notebooks (.ipynb)
  • Wikipedia pages
  • Bing search results
  • YouTube transcripts (if available)

Document Processing Server

For a server processing office documents and PDFs:
pip install markitdown[pdf,docx,xlsx,pptx]

Web Scraping Application

Minimal installation is sufficient:
pip install markitdown

Media Processing Pipeline

For audio and image processing:
pip install markitdown[audio-transcription]

# Also install system dependencies
sudo apt-get install ffmpeg exiftool  # Linux
brew install ffmpeg exiftool          # macOS

Cloud Document Service

Using Azure Document Intelligence:
pip install markitdown[az-doc-intel]

Development Environment

Install everything for testing:
pip install markitdown[all]

Checking Installed Dependencies

Verify which dependencies are installed:
import importlib.util

dependencies = {
    'pptx': 'pptx',
    'docx': 'mammoth',
    'xlsx': 'openpyxl',
    'xls': 'xlrd',
    'pdf': 'pdfplumber',
    'outlook': 'olefile',
    'audio': 'speech_recognition',
    'youtube': 'youtube_transcript_api',
    'azure': 'azure.ai.documentintelligence'
}

for name, module in dependencies.items():
    spec = importlib.util.find_spec(module)
    status = "✓" if spec else "✗"
    print(f"{status} {name}: {module}")

Error Messages

When a required dependency is missing, MarkItDown will raise a MissingDependencyException:
from markitdown import MarkItDown, MissingDependencyException

md = MarkItDown()

try:
    result = md.convert("document.pdf")
except MissingDependencyException as e:
    print(e)
    # Output: "Conversion requires the optional dependency [pdf] to be installed.
    #          E.g., `pip install markitdown[pdf]`"

Upgrade Existing Installation

Add optional dependencies to an existing installation:
# Already have markitdown installed, add PDF support
pip install markitdown[pdf]

# Upgrade to full installation
pip install markitdown[all]

Dependencies in Docker

The official Docker image includes all dependencies:
RUN pip --no-cache-dir install \
    /app/packages/markitdown[all]
For a minimal Docker image, customize the Dockerfile:
# Minimal image with only PDF support
RUN pip --no-cache-dir install \
    /app/packages/markitdown[pdf]

Requirements Files

For reproducible environments:
requirements.txt
markitdown[all]==0.1.0
Or specify exact versions:
requirements.txt
markitdown==0.1.0
python-pptx==0.6.21
mammoth==1.11.0
pandas==2.0.0
# ... etc

System Dependencies

Some features require system-level packages:

ExifTool (for image metadata)

# macOS
brew install exiftool

# Ubuntu/Debian
sudo apt-get install libimage-exiftool-perl

# Set path if needed
export EXIFTOOL_PATH=/usr/bin/exiftool

FFmpeg (for audio processing)

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org

Troubleshooting

Version Conflicts

If you encounter version conflicts:
# Create a fresh virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows

pip install markitdown[all]

Installation Failures

Some packages require build tools:
# Ubuntu/Debian
sudo apt-get install build-essential python3-dev

# macOS (install Xcode Command Line Tools)
xcode-select --install

Check Installed Version

pip show markitdown
pip list | grep markitdown

Build docs developers (and LLMs) love