Installation

Prerequisites

MarkItDown requires Python 3.10 or higher. Before installing, verify your Python version:

python --version

If you have Python 3.9 or earlier, you’ll need to upgrade to use MarkItDown.

Virtual environment setup

We strongly recommend using a virtual environment to avoid dependency conflicts with other Python projects.

Standard Python
uv
Anaconda

Create and activate a virtual environment using Python’s built-in venv module:

python -m venv .venv
source .venv/bin/activate

When activated, your terminal prompt will show (.venv) at the beginning.

To deactivate the virtual environment later, simply run deactivate.

uv is a fast Python package installer. Create a virtual environment with a specific Python version:

uv venv --python=3.12 .venv
source .venv/bin/activate

When using uv, be sure to use uv pip install rather than just pip install to install packages in the virtual environment.

If you’re using Anaconda or Miniconda, create an environment with:

conda create -n markitdown python=3.12
conda activate markitdown

To deactivate later:

conda deactivate

Basic installation

Once your virtual environment is activated, install MarkItDown.

Install all features

For complete format support, install with all optional dependencies:

pip install 'markitdown[all]'

This installs support for:

PDF files
Word documents (.docx)
PowerPoint presentations (.pptx)
Excel spreadsheets (.xlsx, .xls)
Outlook messages (.msg)
Audio transcription
YouTube transcripts
Azure Document Intelligence
And more

The [all] extra provides backward-compatible behavior with MarkItDown versions prior to 0.1.0.

Minimal installation

For a minimal installation with only core dependencies:

pip install markitdown

This includes support for:

HTML
Plain text formats (CSV, JSON, XML)
Jupyter notebooks
EPub files
ZIP archives
Basic image metadata (without OCR)

Selective installation

Starting with version 0.1.0, MarkItDown organizes dependencies into optional feature groups. Install only what you need to keep your environment lean.

Available feature groups

PDF support

pip install 'markitdown[pdf]'

Dependencies:

pdfminer.six - Text extraction
pdfplumber - Table and layout detection

Office documents

Word documents:

pip install 'markitdown[docx]'

PowerPoint presentations:

pip install 'markitdown[pptx]'

Excel spreadsheets (modern):

pip install 'markitdown[xlsx]'

Excel spreadsheets (legacy .xls):

pip install 'markitdown[xls]'

Outlook messages

pip install 'markitdown[outlook]'

Enables conversion of Outlook .msg files.

Audio transcription

pip install 'markitdown[audio-transcription]'

Dependencies:

pydub - Audio processing
SpeechRecognition - Speech-to-text

Supports .wav and .mp3 files.

YouTube transcripts

pip install 'markitdown[youtube-transcription]'

Fetch transcripts from YouTube URLs.

Azure Document Intelligence

pip install 'markitdown[az-doc-intel]'

Dependencies:

azure-ai-documentintelligence
azure-identity

Use Microsoft’s cloud-based document processing service for superior accuracy.

Combining feature groups

Install multiple feature groups by separating them with commas:

pip install 'markitdown[pdf,docx,pptx]'

This installs only PDF, Word, and PowerPoint support.

Start with a minimal installation and add features as needed to minimize dependencies.

Install from source

For development or to use the latest unreleased features, install from the GitHub repository:

Clone the repository

git clone https://github.com/microsoft/markitdown.git
cd markitdown

Install in development mode

pip install -e 'packages/markitdown[all]'

The -e flag installs in editable mode, so changes to the source code are immediately reflected.

Verify installation

markitdown --version

Docker installation

Run MarkItDown in a Docker container for isolated environments:

Build the Docker image

docker build -t markitdown:latest .

Run conversions

docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

The --rm flag automatically removes the container after it exits.

Docker with volume mounting

For easier file access, mount a local directory:

docker run --rm -v $(pwd):/data markitdown:latest /data/document.pdf > output.md

Verify installation

Confirm MarkItDown is installed correctly:

markitdown --version

You should see output like:

markitdown 0.1.0

Test with a sample conversion

Create a simple HTML file and convert it:

echo '<h1>Hello World</h1><p>This is a test.</p>' > test.html
markitdown test.html

Expected output:

# Hello World

This is a test.

Optional dependencies

ExifTool for image metadata

For enhanced image metadata extraction, install ExifTool:

macOS
Linux (Debian/Ubuntu)
Windows

brew install exiftool

sudo apt-get install libimage-exiftool-perl

MarkItDown will automatically detect ExifTool if it’s in your PATH.

LLM integration

To use LLM-powered image descriptions, install an OpenAI-compatible client:

pip install openai

Then use it in your code:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("image.jpg")

Troubleshooting

ModuleNotFoundError for a specific format

If you get an error like MissingDependencyException when converting a file, install the specific feature group:

pip install 'markitdown[pdf]'  # For PDF files
pip install 'markitdown[docx]' # For Word documents
# etc.

Permission denied errors

On Unix systems, if you encounter permission errors, avoid using sudo. Instead, ensure you’re in a virtual environment:

python -m venv .venv
source .venv/bin/activate
pip install 'markitdown[all]'

Python version conflicts

If you have multiple Python versions installed, be explicit:

python3.12 -m venv .venv
source .venv/bin/activate
pip install 'markitdown[all]'

SSL certificate errors

If you encounter SSL errors when downloading packages:

pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org 'markitdown[all]'

Only use this as a last resort. SSL verification is important for security.

Upgrading

To upgrade to the latest version:

pip install --upgrade 'markitdown[all]'

Breaking changes in 0.1.0

If you’re upgrading from version 0.0.1, be aware of these breaking changes:

Dependencies are now organized into optional feature groups. Use pip install 'markitdown[all]' for backward-compatible behavior.
convert_stream() now requires a binary file-like object (not text).
The DocumentConverter interface has changed to read from streams rather than file paths.

Next steps

Quickstart guide

Convert your first document with MarkItDown

Python API

Explore the complete API reference

CLI reference

Learn all command-line options

Plugins

Extend MarkItDown with custom converters

Get Started

Guides

File Formats

Advanced

Prerequisites

Virtual environment setup

Basic installation

Install all features

Minimal installation

Selective installation

Available feature groups

Combining feature groups

Install from source

Docker installation

Docker with volume mounting

Verify installation

Test with a sample conversion

Optional dependencies

ExifTool for image metadata

LLM integration

Troubleshooting

Upgrading

Breaking changes in 0.1.0

Next steps

Quickstart guide

Python API

CLI reference

Plugins

Build docs developers (and LLMs) love

Get Started

Guides

File Formats

Advanced

​Prerequisites

​Virtual environment setup

​Basic installation

​Install all features

​Minimal installation

​Selective installation

​Available feature groups

​Combining feature groups

​Install from source

​Docker installation

​Docker with volume mounting

​Verify installation

​Test with a sample conversion

​Optional dependencies

​ExifTool for image metadata

​LLM integration

​Troubleshooting

​Upgrading

​Breaking changes in 0.1.0

​Next steps

Quickstart guide

Python API

CLI reference

Plugins

Build docs developers (and LLMs) love

Prerequisites

Virtual environment setup

Basic installation

Install all features

Minimal installation

Selective installation

Available feature groups

Combining feature groups

Install from source

Docker installation

Docker with volume mounting

Verify installation

Test with a sample conversion

Optional dependencies

ExifTool for image metadata

LLM integration

Troubleshooting

Upgrading

Breaking changes in 0.1.0

Next steps