Skip to main content
MarkItDown provides a powerful command-line interface for converting various file formats to Markdown.

Basic Usage

Convert a file to Markdown and output to stdout:
markitdown example.pdf

Input Methods

MarkItDown supports multiple ways to provide input:
markitdown example.pdf
When reading from stdin (pipe or redirection), you may need to provide hints about the file type using --extension or --mime-type flags.

Output Options

Control where the Markdown output is written:
markitdown example.pdf -o example.md

Command-Line Flags

Version Information

markitdown --version
markitdown -v
Displays the version number and exits.

File Type Hints

When reading from stdin or when the file type cannot be automatically detected:
markitdown -x .pdf < document
markitdown --extension pdf < document
Provide a hint about the file extension. The leading dot is optional.
markitdown -m application/pdf < document
markitdown --mime-type application/pdf < document
Provide a hint about the MIME type.
markitdown -c UTF-8 < document
markitdown --charset UTF-8 < document
Provide a hint about the character encoding.

Azure Document Intelligence

Use Azure Document Intelligence for cloud-based conversion:
markitdown -d -e https://YOUR_ENDPOINT.cognitiveservices.azure.com/ example.pdf
markitdown --use-docintel --endpoint https://YOUR_ENDPOINT.cognitiveservices.azure.com/ example.pdf
Document Intelligence requires:
  • A valid Azure endpoint URL (required)
  • Authentication via AZURE_API_KEY environment variable or Azure credentials
  • A file path (stdin is not supported with Document Intelligence)

Plugin Support

Enable third-party plugins:
markitdown -p example.rtf
markitdown --use-plugins example.rtf
List installed plugins:
markitdown --list-plugins
Output shows:
Installed MarkItDown 3rd-party Plugins:

  * sample_plugin    	(package: markitdown_sample_plugin)

Use the -p (or --use-plugins) option to enable 3rd-party plugins.
Find plugins by searching for the hashtag #markitdown-plugin on GitHub.

Data URI Handling

By default, data URIs (like base64-encoded images) are truncated in the output:
markitdown --keep-data-uris example.html
Keeps full data URIs in the output, which can significantly increase file size.

Common Patterns

Batch Conversion

Convert multiple files:
for file in *.pdf; do
    markitdown "$file" -o "${file%.pdf}.md"
done

Piping with Processing

Combine with other tools:
# Download and convert
curl https://example.com/document.pdf | markitdown -x .pdf > output.md

# Convert and count words
markitdown document.docx | wc -w

# Convert and search
markitdown report.pdf | grep "quarterly results"

Using with stdin Hints

When the file type cannot be inferred from context:
# Provide extension hint
cat mystery_file | markitdown -x .xlsx -o output.md

# Provide MIME type hint
echo "data" | markitdown -m text/plain

# Provide charset hint for non-UTF-8 files
markitdown -c ISO-8859-1 -x .txt < legacy_file

Exit Codes

  • 0: Successful conversion
  • 1: Error occurred (file not found, conversion failed, invalid arguments, etc.)

Examples by File Type

1

PDF Documents

markitdown document.pdf -o document.md
2

Word Documents

markitdown report.docx -o report.md
3

Excel Spreadsheets

markitdown data.xlsx -o data.md
4

PowerPoint Presentations

markitdown slides.pptx -o slides.md
5

Images with Metadata

markitdown photo.jpg -o photo.md

Troubleshooting

Encoding Issues

If you see garbled characters, try specifying the charset:
markitdown -c UTF-8 file.txt

File Type Not Detected

Provide explicit hints:
markitdown -x .pdf -m application/pdf < file

Missing Dependencies

If conversion fails due to missing dependencies, install the appropriate optional dependencies:
pip install markitdown[pdf]  # For PDF support
pip install markitdown[all]  # For all formats
See the Optional Dependencies guide for details.

Build docs developers (and LLMs) love