MarkItDown provides a powerful command-line interface for converting various file formats to Markdown.
Basic Usage
Convert a file to Markdown and output to stdout:
MarkItDown supports multiple ways to provide input:
When reading from stdin (pipe or redirection), you may need to provide hints about the file type using --extension or --mime-type flags.
Output Options
Control where the Markdown output is written:
markitdown example.pdf -o example.md
Command-Line Flags
markitdown --version
markitdown -v
Displays the version number and exits.
File Type Hints
When reading from stdin or when the file type cannot be automatically detected:
markitdown -x .pdf < document
markitdown --extension pdf < document
Provide a hint about the file extension. The leading dot is optional.
markitdown -m application/pdf < document
markitdown --mime-type application/pdf < document
Provide a hint about the MIME type.
markitdown -c UTF-8 < document
markitdown --charset UTF-8 < document
Provide a hint about the character encoding.
Azure Document Intelligence
Use Azure Document Intelligence for cloud-based conversion:
markitdown -d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ example.pdf
markitdown --use-docintel --endpoint https://YOUR_ENDPOINT.cognitiveservices.azure.com/ example.pdf
Document Intelligence requires:
- A valid Azure endpoint URL (required)
- Authentication via
AZURE_API_KEY environment variable or Azure credentials
- A file path (stdin is not supported with Document Intelligence)
Plugin Support
Enable third-party plugins:
markitdown -p example.rtf
markitdown --use-plugins example.rtf
List installed plugins:
markitdown --list-plugins
Output shows:
Installed MarkItDown 3rd-party Plugins:
* sample_plugin (package: markitdown_sample_plugin)
Use the -p (or --use-plugins) option to enable 3rd-party plugins.
Find plugins by searching for the hashtag #markitdown-plugin on GitHub.
Data URI Handling
By default, data URIs (like base64-encoded images) are truncated in the output:
markitdown --keep-data-uris example.html
Keeps full data URIs in the output, which can significantly increase file size.
Common Patterns
Batch Conversion
Convert multiple files:
for file in *.pdf; do
markitdown "$file" -o "${file%.pdf}.md"
done
Piping with Processing
Combine with other tools:
# Download and convert
curl https://example.com/document.pdf | markitdown -x .pdf > output.md
# Convert and count words
markitdown document.docx | wc -w
# Convert and search
markitdown report.pdf | grep "quarterly results"
Using with stdin Hints
When the file type cannot be inferred from context:
# Provide extension hint
cat mystery_file | markitdown -x .xlsx -o output.md
# Provide MIME type hint
echo "data" | markitdown -m text/plain
# Provide charset hint for non-UTF-8 files
markitdown -c ISO-8859-1 -x .txt < legacy_file
Exit Codes
- 0: Successful conversion
- 1: Error occurred (file not found, conversion failed, invalid arguments, etc.)
Examples by File Type
PDF Documents
markitdown document.pdf -o document.md
Word Documents
markitdown report.docx -o report.md
Excel Spreadsheets
markitdown data.xlsx -o data.md
PowerPoint Presentations
markitdown slides.pptx -o slides.md
Images with Metadata
markitdown photo.jpg -o photo.md
Troubleshooting
Encoding Issues
If you see garbled characters, try specifying the charset:
markitdown -c UTF-8 file.txt
File Type Not Detected
Provide explicit hints:
markitdown -x .pdf -m application/pdf < file
Missing Dependencies
If conversion fails due to missing dependencies, install the appropriate optional dependencies:
pip install markitdown[pdf] # For PDF support
pip install markitdown[all] # For all formats
See the Optional Dependencies guide for details.