Skip to main content

Prerequisites

MarkItDown requires Python 3.10 or higher. Before installing, verify your Python version:
python --version
If you have Python 3.9 or earlier, you’ll need to upgrade to use MarkItDown.

Virtual environment setup

We strongly recommend using a virtual environment to avoid dependency conflicts with other Python projects.
Create and activate a virtual environment using Python’s built-in venv module:
python -m venv .venv
source .venv/bin/activate
When activated, your terminal prompt will show (.venv) at the beginning.
To deactivate the virtual environment later, simply run deactivate.

Basic installation

Once your virtual environment is activated, install MarkItDown.

Install all features

For complete format support, install with all optional dependencies:
pip install 'markitdown[all]'
This installs support for:
  • PDF files
  • Word documents (.docx)
  • PowerPoint presentations (.pptx)
  • Excel spreadsheets (.xlsx, .xls)
  • Outlook messages (.msg)
  • Audio transcription
  • YouTube transcripts
  • Azure Document Intelligence
  • And more
The [all] extra provides backward-compatible behavior with MarkItDown versions prior to 0.1.0.

Minimal installation

For a minimal installation with only core dependencies:
pip install markitdown
This includes support for:
  • HTML
  • Plain text formats (CSV, JSON, XML)
  • Jupyter notebooks
  • EPub files
  • ZIP archives
  • Basic image metadata (without OCR)

Selective installation

Starting with version 0.1.0, MarkItDown organizes dependencies into optional feature groups. Install only what you need to keep your environment lean.

Available feature groups

pip install 'markitdown[pdf]'
Dependencies:
  • pdfminer.six - Text extraction
  • pdfplumber - Table and layout detection
Word documents:
pip install 'markitdown[docx]'
PowerPoint presentations:
pip install 'markitdown[pptx]'
Excel spreadsheets (modern):
pip install 'markitdown[xlsx]'
Excel spreadsheets (legacy .xls):
pip install 'markitdown[xls]'
pip install 'markitdown[outlook]'
Enables conversion of Outlook .msg files.
pip install 'markitdown[audio-transcription]'
Dependencies:
  • pydub - Audio processing
  • SpeechRecognition - Speech-to-text
Supports .wav and .mp3 files.
pip install 'markitdown[youtube-transcription]'
Fetch transcripts from YouTube URLs.
pip install 'markitdown[az-doc-intel]'
Dependencies:
  • azure-ai-documentintelligence
  • azure-identity
Use Microsoft’s cloud-based document processing service for superior accuracy.

Combining feature groups

Install multiple feature groups by separating them with commas:
pip install 'markitdown[pdf,docx,pptx]'
This installs only PDF, Word, and PowerPoint support.
Start with a minimal installation and add features as needed to minimize dependencies.

Install from source

For development or to use the latest unreleased features, install from the GitHub repository:
1

Clone the repository

git clone https://github.com/microsoft/markitdown.git
cd markitdown
2

Install in development mode

pip install -e 'packages/markitdown[all]'
The -e flag installs in editable mode, so changes to the source code are immediately reflected.
3

Verify installation

markitdown --version

Docker installation

Run MarkItDown in a Docker container for isolated environments:
1

Build the Docker image

docker build -t markitdown:latest .
2

Run conversions

docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
The --rm flag automatically removes the container after it exits.

Docker with volume mounting

For easier file access, mount a local directory:
docker run --rm -v $(pwd):/data markitdown:latest /data/document.pdf > output.md

Verify installation

Confirm MarkItDown is installed correctly:
markitdown --version
You should see output like:
markitdown 0.1.0

Test with a sample conversion

Create a simple HTML file and convert it:
echo '<h1>Hello World</h1><p>This is a test.</p>' > test.html
markitdown test.html
Expected output:
# Hello World

This is a test.

Optional dependencies

ExifTool for image metadata

For enhanced image metadata extraction, install ExifTool:
brew install exiftool
MarkItDown will automatically detect ExifTool if it’s in your PATH.

LLM integration

To use LLM-powered image descriptions, install an OpenAI-compatible client:
pip install openai
Then use it in your code:
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("image.jpg")

Troubleshooting

If you get an error like MissingDependencyException when converting a file, install the specific feature group:
pip install 'markitdown[pdf]'  # For PDF files
pip install 'markitdown[docx]' # For Word documents
# etc.
On Unix systems, if you encounter permission errors, avoid using sudo. Instead, ensure you’re in a virtual environment:
python -m venv .venv
source .venv/bin/activate
pip install 'markitdown[all]'
If you have multiple Python versions installed, be explicit:
python3.12 -m venv .venv
source .venv/bin/activate
pip install 'markitdown[all]'
If you encounter SSL errors when downloading packages:
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org 'markitdown[all]'
Only use this as a last resort. SSL verification is important for security.

Upgrading

To upgrade to the latest version:
pip install --upgrade 'markitdown[all]'

Breaking changes in 0.1.0

If you’re upgrading from version 0.0.1, be aware of these breaking changes:
  • Dependencies are now organized into optional feature groups. Use pip install 'markitdown[all]' for backward-compatible behavior.
  • convert_stream() now requires a binary file-like object (not text).
  • The DocumentConverter interface has changed to read from streams rather than file paths.

Next steps

Quickstart guide

Convert your first document with MarkItDown

Python API

Explore the complete API reference

CLI reference

Learn all command-line options

Plugins

Extend MarkItDown with custom converters

Build docs developers (and LLMs) love