MarkItDown provides a Docker image with all dependencies pre-installed, including optional tools like FFmpeg and ExifTool.
Quick Start
Using Pre-built Image
Convert a file using the Docker image:
docker run -v $(pwd):/data markitdown /data/example.pdf > example.md
Building the Image
Build the Docker image from source:
# Clone the repository
git clone https://github.com/microsoft/markitdown.git
cd markitdown
# Build the image
docker build -t markitdown .
Build Arguments
The Dockerfile supports optional build arguments:
# Include git for development
docker build --build-arg INSTALL_GIT=true -t markitdown .
# Set custom user/group ID
docker build --build-arg USERID=1000 --build-arg GROUPID=1000 -t markitdown .
Dockerfile Details
The MarkItDown Dockerfile is based on python:3.13-slim-bullseye and includes:
Base Image
FROM python:3.13-slim-bullseye
Lightweight Python 3.13 on Debian BullseyeSystem Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
exiftool
Installs FFmpeg for audio processing and ExifTool for image metadataPython Packages
RUN pip --no-cache-dir install \
/app/packages/markitdown[all] \
/app/packages/markitdown-sample-plugin
Installs MarkItDown with all optional dependencies and the sample pluginUser Configuration
ARG USERID=nobody
ARG GROUPID=nogroup
USER $USERID:$GROUPID
Runs as non-root user for security
Environment Variables
The following environment variables are set:
ENV DEBIAN_FRONTEND=noninteractive
ENV EXIFTOOL_PATH=/usr/bin/exiftool
ENV FFMPEG_PATH=/usr/bin/ffmpeg
Usage Examples
Basic File Conversion
Mount your files directory and convert:
docker run -v $(pwd):/data markitdown /data/document.pdf
Save Output to File
docker run -v $(pwd):/data markitdown /data/document.pdf -o /data/output.md
Or using shell redirection:
docker run -v $(pwd):/data markitdown /data/document.pdf > output.md
Using stdin
Pipe content into the container:
cat document.pdf | docker run -i markitdown -x .pdf > output.md
With Plugins
Plugins are already installed in the image:
docker run -v $(pwd):/data markitdown -p /data/file.rtf
List Installed Plugins
docker run markitdown --list-plugins
Batch Conversion
Convert multiple files:
for file in *.pdf; do
docker run -v $(pwd):/data markitdown "/data/$file" -o "/data/${file%.pdf}.md"
done
Interactive Shell
Drop into a shell to run multiple commands:
docker run -it -v $(pwd):/data --entrypoint /bin/bash markitdown
Then inside the container:
markitdown /data/file1.pdf -o /data/file1.md
markitdown /data/file2.docx -o /data/file2.md
Docker Compose
Create a docker-compose.yml for repeated use:
version: '3.8'
services:
markitdown:
image: markitdown
volumes:
- ./documents:/data
entrypoint: ["markitdown"]
command: ["/data/example.pdf"]
Run with:
docker-compose run markitdown /data/document.pdf -o /data/output.md
Azure Document Intelligence
Pass Azure credentials via environment variables:
docker run \
-e AZURE_API_KEY="your-api-key" \
-v $(pwd):/data \
markitdown \
-d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ \
/data/document.pdf
Or using Azure Identity (for managed identities):
docker run \
-e AZURE_TENANT_ID="tenant-id" \
-e AZURE_CLIENT_ID="client-id" \
-e AZURE_CLIENT_SECRET="client-secret" \
-v $(pwd):/data \
markitdown \
-d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ \
/data/document.pdf
Volume Mounting
Linux/macOS
Mount current directory:
docker run -v $(pwd):/data markitdown /data/file.pdf
Mount specific directory:
docker run -v /path/to/documents:/data markitdown /data/file.pdf
Windows (PowerShell)
docker run -v ${PWD}:/data markitdown /data/file.pdf
Windows (Command Prompt)
docker run -v %cd%:/data markitdown /data/file.pdf
Docker adds some overhead compared to native execution. For batch processing of many files, consider:
- Running the container in interactive mode and processing multiple files
- Using volume mounts efficiently
- Building the image locally to avoid download times
Troubleshooting
Permission Issues
If you encounter permission errors with output files:
# Build with your user ID
docker build --build-arg USERID=$(id -u) --build-arg GROUPID=$(id -g) -t markitdown .
# Or run with your user
docker run -u $(id -u):$(id -g) -v $(pwd):/data markitdown /data/file.pdf
File Not Found
Ensure the file path is relative to the mounted volume:
# Wrong: docker run -v $(pwd)/docs:/data markitdown document.pdf
# Correct:
docker run -v $(pwd)/docs:/data markitdown /data/document.pdf
Missing Dependencies
The Docker image includes all optional dependencies by default. If you built a custom image without [all]:
RUN pip --no-cache-dir install /app/packages/markitdown[all]
Image Size
The image includes:
- Python 3.13
- FFmpeg (for audio processing)
- ExifTool (for image metadata)
- All MarkItDown optional dependencies
- Sample plugin
Expect the image size to be approximately 500MB-1GB.
Security
The default container runs as the nobody user for security. Avoid running as root unless necessary.
Best practices:
- Keep the base image updated
- Use specific image tags in production
- Scan images for vulnerabilities
- Limit volume mounts to necessary directories only