Skip to main content
MarkItDown supports a wide range of file formats, converting them to clean, structured Markdown. Each converter is designed to preserve important content while producing readable output.

Format Categories

Office Documents

Word, PowerPoint, Excel, and Outlook files

PDF Documents

PDF files with table extraction and text processing

Images

JPEG and PNG with EXIF metadata and OCR

Audio Files

Audio files with metadata and speech transcription

Web Content

HTML, RSS, Wikipedia, YouTube, and Bing SERP

Other Formats

CSV, JSON, XML, ZIP, EPUB, and Jupyter notebooks

All Supported Formats

Office Documents

FormatExtensionDependencies
Word.docxmammoth
PowerPoint.pptxpython-pptx
Excel (modern).xlsxpandas, openpyxl
Excel (legacy).xlspandas, xlrd
Outlook.msgolefile

Documents

FormatExtensionDependencies
PDF.pdfpdfminer.six, pdfplumber
EPUB.epubBuilt-in
Jupyter Notebook.ipynbBuilt-in

Media

FormatExtensionDependencies
JPEG Images.jpg, .jpegexiftool (optional)
PNG Images.pngexiftool (optional)
Audio (WAV).wavspeech_recognition, pydub
Audio (MP3).mp3speech_recognition, pydub
Audio (M4A).m4aspeech_recognition, pydub
Video (MP4).mp4speech_recognition, pydub

Web & Data

FormatExtensionDependencies
HTML.html, .htmbeautifulsoup4
RSS/Atom.rss, .atom, .xmlbeautifulsoup4, defusedxml
CSV.csvBuilt-in
JSON.json, .jsonlBuilt-in
Plain Text.txt, .mdBuilt-in
ZIP Archives.zipBuilt-in

Web Services

ServiceURL PatternDependencies
Wikipedia*.wikipedia.orgbeautifulsoup4
YouTubeyoutube.com/watch?v=*beautifulsoup4, youtube-transcript-api
Bing Searchbing.com/search?q=*beautifulsoup4

Feature Matrix

Format CategoryText ExtractionTable SupportMetadataImagesAdvanced Features
Office DocumentsCharts, slide notes
PDFForm detection
ImagesEXIF, LLM captioning, OCR
AudioSpeech transcription
Web ContentFeed parsing
Data FormatsStructure preservation

Installation by Format

Install dependencies for specific format categories:
pip install markitdown[office]
# Includes: mammoth, python-pptx, pandas, openpyxl, xlrd, olefile

Next Steps

Quick Start

Get started with basic conversion

Python API

Learn the programmatic interface

Build docs developers (and LLMs) love