Docker Usage

MarkItDown provides a Docker image with all dependencies pre-installed, including optional tools like FFmpeg and ExifTool.

Quick Start

Using Pre-built Image

Convert a file using the Docker image:

docker run -v $(pwd):/data markitdown /data/example.pdf > example.md

Building the Image

Build the Docker image from source:

# Clone the repository
git clone https://github.com/microsoft/markitdown.git
cd markitdown

# Build the image
docker build -t markitdown .

Build Arguments

The Dockerfile supports optional build arguments:

# Include git for development
docker build --build-arg INSTALL_GIT=true -t markitdown .

# Set custom user/group ID
docker build --build-arg USERID=1000 --build-arg GROUPID=1000 -t markitdown .

Dockerfile Details

The MarkItDown Dockerfile is based on python:3.13-slim-bullseye and includes:

Base Image

FROM python:3.13-slim-bullseye

Lightweight Python 3.13 on Debian Bullseye

System Dependencies

RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg \
    exiftool

Installs FFmpeg for audio processing and ExifTool for image metadata

Python Packages

RUN pip --no-cache-dir install \
    /app/packages/markitdown[all] \
    /app/packages/markitdown-sample-plugin

Installs MarkItDown with all optional dependencies and the sample plugin

User Configuration

ARG USERID=nobody
ARG GROUPID=nogroup
USER $USERID:$GROUPID

Runs as non-root user for security

Environment Variables

The following environment variables are set:

ENV DEBIAN_FRONTEND=noninteractive
ENV EXIFTOOL_PATH=/usr/bin/exiftool
ENV FFMPEG_PATH=/usr/bin/ffmpeg

Usage Examples

Basic File Conversion

Mount your files directory and convert:

docker run -v $(pwd):/data markitdown /data/document.pdf

Save Output to File

docker run -v $(pwd):/data markitdown /data/document.pdf -o /data/output.md

Or using shell redirection:

docker run -v $(pwd):/data markitdown /data/document.pdf > output.md

Using stdin

Pipe content into the container:

cat document.pdf | docker run -i markitdown -x .pdf > output.md

With Plugins

Plugins are already installed in the image:

docker run -v $(pwd):/data markitdown -p /data/file.rtf

List Installed Plugins

docker run markitdown --list-plugins

Batch Conversion

Convert multiple files:

for file in *.pdf; do
    docker run -v $(pwd):/data markitdown "/data/$file" -o "/data/${file%.pdf}.md"
done

Interactive Shell

Drop into a shell to run multiple commands:

docker run -it -v $(pwd):/data --entrypoint /bin/bash markitdown

Then inside the container:

markitdown /data/file1.pdf -o /data/file1.md
markitdown /data/file2.docx -o /data/file2.md

Docker Compose

Create a docker-compose.yml for repeated use:

version: '3.8'

services:
  markitdown:
    image: markitdown
    volumes:
      - ./documents:/data
    entrypoint: ["markitdown"]
    command: ["/data/example.pdf"]

Run with:

docker-compose run markitdown /data/document.pdf -o /data/output.md

Azure Document Intelligence

Pass Azure credentials via environment variables:

docker run \
  -e AZURE_API_KEY="your-api-key" \
  -v $(pwd):/data \
  markitdown \
  -d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ \
  /data/document.pdf

Or using Azure Identity (for managed identities):

docker run \
  -e AZURE_TENANT_ID="tenant-id" \
  -e AZURE_CLIENT_ID="client-id" \
  -e AZURE_CLIENT_SECRET="client-secret" \
  -v $(pwd):/data \
  markitdown \
  -d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ \
  /data/document.pdf

Volume Mounting

Linux/macOS

Mount current directory:

docker run -v $(pwd):/data markitdown /data/file.pdf

Mount specific directory:

docker run -v /path/to/documents:/data markitdown /data/file.pdf

Windows (PowerShell)

docker run -v ${PWD}:/data markitdown /data/file.pdf

Windows (Command Prompt)

docker run -v %cd%:/data markitdown /data/file.pdf

Performance Considerations

Docker adds some overhead compared to native execution. For batch processing of many files, consider:

Running the container in interactive mode and processing multiple files
Using volume mounts efficiently
Building the image locally to avoid download times

Troubleshooting

Permission Issues

If you encounter permission errors with output files:

# Build with your user ID
docker build --build-arg USERID=$(id -u) --build-arg GROUPID=$(id -g) -t markitdown .

# Or run with your user
docker run -u $(id -u):$(id -g) -v $(pwd):/data markitdown /data/file.pdf

File Not Found

Ensure the file path is relative to the mounted volume:

# Wrong: docker run -v $(pwd)/docs:/data markitdown document.pdf
# Correct:
docker run -v $(pwd)/docs:/data markitdown /data/document.pdf

Missing Dependencies

The Docker image includes all optional dependencies by default. If you built a custom image without [all]:

RUN pip --no-cache-dir install /app/packages/markitdown[all]

Image Size

The image includes:

Python 3.13
FFmpeg (for audio processing)
ExifTool (for image metadata)
All MarkItDown optional dependencies
Sample plugin

Expect the image size to be approximately 500MB-1GB.

Security

The default container runs as the nobody user for security. Avoid running as root unless necessary.

Best practices:

Keep the base image updated
Use specific image tags in production
Scan images for vulnerabilities
Limit volume mounts to necessary directories only

Get Started

Guides

File Formats

Advanced

Quick Start

Using Pre-built Image

Building the Image

Build Arguments

Dockerfile Details

Environment Variables

Usage Examples

Basic File Conversion

Save Output to File

Using stdin

With Plugins

List Installed Plugins

Batch Conversion

Interactive Shell

Docker Compose

Azure Document Intelligence

Volume Mounting

Linux/macOS

Windows (PowerShell)

Windows (Command Prompt)

Performance Considerations

Troubleshooting

Permission Issues

File Not Found

Missing Dependencies

Image Size

Security

Build docs developers (and LLMs) love

Get Started

Guides

File Formats

Advanced

​Quick Start

​Using Pre-built Image

​Building the Image

​Build Arguments

​Dockerfile Details

​Environment Variables

​Usage Examples

​Basic File Conversion

​Save Output to File

​Using stdin

​With Plugins

​List Installed Plugins

​Batch Conversion

​Interactive Shell

​Docker Compose

​Azure Document Intelligence

​Volume Mounting

​Linux/macOS

​Windows (PowerShell)

​Windows (Command Prompt)

​Performance Considerations

​Troubleshooting

​Permission Issues

​File Not Found

​Missing Dependencies

​Image Size

​Security

Build docs developers (and LLMs) love

Quick Start

Using Pre-built Image

Building the Image

Build Arguments

Dockerfile Details

Environment Variables

Usage Examples

Basic File Conversion

Save Output to File

Using stdin

With Plugins

List Installed Plugins

Batch Conversion

Interactive Shell

Docker Compose

Azure Document Intelligence

Volume Mounting

Linux/macOS

Windows (PowerShell)

Windows (Command Prompt)

Performance Considerations

Troubleshooting

Permission Issues

File Not Found

Missing Dependencies

Image Size

Security