Skip to main content
MarkItDown provides a Docker image with all dependencies pre-installed, including optional tools like FFmpeg and ExifTool.

Quick Start

Using Pre-built Image

Convert a file using the Docker image:
docker run -v $(pwd):/data markitdown /data/example.pdf > example.md

Building the Image

Build the Docker image from source:
# Clone the repository
git clone https://github.com/microsoft/markitdown.git
cd markitdown

# Build the image
docker build -t markitdown .

Build Arguments

The Dockerfile supports optional build arguments:
# Include git for development
docker build --build-arg INSTALL_GIT=true -t markitdown .

# Set custom user/group ID
docker build --build-arg USERID=1000 --build-arg GROUPID=1000 -t markitdown .

Dockerfile Details

The MarkItDown Dockerfile is based on python:3.13-slim-bullseye and includes:
1

Base Image

FROM python:3.13-slim-bullseye
Lightweight Python 3.13 on Debian Bullseye
2

System Dependencies

RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg \
    exiftool
Installs FFmpeg for audio processing and ExifTool for image metadata
3

Python Packages

RUN pip --no-cache-dir install \
    /app/packages/markitdown[all] \
    /app/packages/markitdown-sample-plugin
Installs MarkItDown with all optional dependencies and the sample plugin
4

User Configuration

ARG USERID=nobody
ARG GROUPID=nogroup
USER $USERID:$GROUPID
Runs as non-root user for security

Environment Variables

The following environment variables are set:
ENV DEBIAN_FRONTEND=noninteractive
ENV EXIFTOOL_PATH=/usr/bin/exiftool
ENV FFMPEG_PATH=/usr/bin/ffmpeg

Usage Examples

Basic File Conversion

Mount your files directory and convert:
docker run -v $(pwd):/data markitdown /data/document.pdf

Save Output to File

docker run -v $(pwd):/data markitdown /data/document.pdf -o /data/output.md
Or using shell redirection:
docker run -v $(pwd):/data markitdown /data/document.pdf > output.md

Using stdin

Pipe content into the container:
cat document.pdf | docker run -i markitdown -x .pdf > output.md

With Plugins

Plugins are already installed in the image:
docker run -v $(pwd):/data markitdown -p /data/file.rtf

List Installed Plugins

docker run markitdown --list-plugins

Batch Conversion

Convert multiple files:
for file in *.pdf; do
    docker run -v $(pwd):/data markitdown "/data/$file" -o "/data/${file%.pdf}.md"
done

Interactive Shell

Drop into a shell to run multiple commands:
docker run -it -v $(pwd):/data --entrypoint /bin/bash markitdown
Then inside the container:
markitdown /data/file1.pdf -o /data/file1.md
markitdown /data/file2.docx -o /data/file2.md

Docker Compose

Create a docker-compose.yml for repeated use:
version: '3.8'

services:
  markitdown:
    image: markitdown
    volumes:
      - ./documents:/data
    entrypoint: ["markitdown"]
    command: ["/data/example.pdf"]
Run with:
docker-compose run markitdown /data/document.pdf -o /data/output.md

Azure Document Intelligence

Pass Azure credentials via environment variables:
docker run \
  -e AZURE_API_KEY="your-api-key" \
  -v $(pwd):/data \
  markitdown \
  -d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ \
  /data/document.pdf
Or using Azure Identity (for managed identities):
docker run \
  -e AZURE_TENANT_ID="tenant-id" \
  -e AZURE_CLIENT_ID="client-id" \
  -e AZURE_CLIENT_SECRET="client-secret" \
  -v $(pwd):/data \
  markitdown \
  -d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ \
  /data/document.pdf

Volume Mounting

Linux/macOS

Mount current directory:
docker run -v $(pwd):/data markitdown /data/file.pdf
Mount specific directory:
docker run -v /path/to/documents:/data markitdown /data/file.pdf

Windows (PowerShell)

docker run -v ${PWD}:/data markitdown /data/file.pdf

Windows (Command Prompt)

docker run -v %cd%:/data markitdown /data/file.pdf

Performance Considerations

Docker adds some overhead compared to native execution. For batch processing of many files, consider:
  • Running the container in interactive mode and processing multiple files
  • Using volume mounts efficiently
  • Building the image locally to avoid download times

Troubleshooting

Permission Issues

If you encounter permission errors with output files:
# Build with your user ID
docker build --build-arg USERID=$(id -u) --build-arg GROUPID=$(id -g) -t markitdown .

# Or run with your user
docker run -u $(id -u):$(id -g) -v $(pwd):/data markitdown /data/file.pdf

File Not Found

Ensure the file path is relative to the mounted volume:
# Wrong: docker run -v $(pwd)/docs:/data markitdown document.pdf
# Correct:
docker run -v $(pwd)/docs:/data markitdown /data/document.pdf

Missing Dependencies

The Docker image includes all optional dependencies by default. If you built a custom image without [all]:
RUN pip --no-cache-dir install /app/packages/markitdown[all]

Image Size

The image includes:
  • Python 3.13
  • FFmpeg (for audio processing)
  • ExifTool (for image metadata)
  • All MarkItDown optional dependencies
  • Sample plugin
Expect the image size to be approximately 500MB-1GB.

Security

The default container runs as the nobody user for security. Avoid running as root unless necessary.
Best practices:
  • Keep the base image updated
  • Use specific image tags in production
  • Scan images for vulnerabilities
  • Limit volume mounts to necessary directories only

Build docs developers (and LLMs) love