MarkItDown supports various additional file formats for data, archives, and structured documents.
CSV Comma-separated values
XML XML documents via RSS parser
CSV Files
Dependencies
No external dependencies - uses Python’s built-in csv module.
Features
Converts CSV to Markdown tables
First row treated as header
Automatic encoding detection with charset-normalizer
Handles irregular row lengths
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "data.csv" )
print (result.markdown)
Example
Input CSV :
Name, Age, City
John Doe, 30, New York
Jane Smith, 25, Los Angeles
Bob Johnson, 35, Chicago
Output Markdown :
| Name | Age | City |
| --- | --- | --- |
| John Doe | 30 | New York |
| Jane Smith | 25 | Los Angeles |
| Bob Johnson | 35 | Chicago |
Implementation
Converter Class : CsvConverter (_csv_converter.py)
Accepted Extensions : .csv
MIME Types : text/csv, application/csv
Encoding : Uses charset-normalizer if charset not specified
JSON Files
Dependencies
No external dependencies - uses Python’s built-in json module.
Features
Plain text output (not converted to structured Markdown)
Preserves JSON formatting
Supports both .json and .jsonl files
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "data.json" )
print (result.markdown) # Raw JSON text
Implementation
Converter Class : PlainTextConverter (_plain_text_converter.py)
Accepted Extensions : .json, .jsonl
MIME Types : application/json
Processing : Treated as plain text, no special JSON parsing
JSON files are processed as plain text. For structured JSON-to-Markdown conversion, consider pre-processing with jq or a custom script.
XML Files
Dependencies
pip install beautifulsoup4 defusedxml
Features
XML files are processed by the RSS converter
Only RSS/Atom feeds are specially formatted
Other XML treated as plain text
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "feed.xml" )
print (result.markdown) # Formatted feed content
Implementation
RSS/Atom XML : RssConverter (_rss_converter.py) - see Web Content
Other XML : PlainTextConverter (_plain_text_converter.py)
ZIP Archives
Dependencies
No external dependencies - uses Python’s built-in zipfile module.
Features
Extracts and converts each file in the archive
Recursively processes nested files
Skips unsupported formats silently
Preserves file paths in output
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "archive.zip" )
print (result.markdown)
Content from the zip file `archive.zip` :
## File: README.md
# Project Title
Project description here...
## File: docs/guide.docx
# User Guide
Welcome to the user guide...
## File: data/results.csv
| Name | Value | Status |
|------|-------|--------|
| Test1 | 42 | Pass |
| Test2 | 37 | Pass |
## File: images/logo.png
ImageSize: 512x512
DateTimeOriginal: 2024:02:15 10:00:00
Implementation
Converter Class : ZipConverter (_zip_converter.py)
Accepted Extensions : .zip
MIME Types : application/zip
Processing : Each file converted independently using appropriate converter
Advanced Example
from markitdown import MarkItDown
from openai import OpenAI
# ZIP with images - use LLM for image descriptions
client = OpenAI()
md = MarkItDown(
llm_client = client,
llm_model = "gpt-4o" ,
exiftool_path = "/usr/local/bin/exiftool"
)
result = md.convert( "photos.zip" )
print (result.markdown) # Includes image descriptions
EPUB Books
Dependencies
pip install beautifulsoup4 defusedxml
Features
Extracts book metadata (title, authors, publisher, etc.)
Converts chapters in reading order
Preserves chapter structure
Handles XHTML content
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "book.epub" )
print (result.title) # Book title
print (result.markdown) # Full book content
**Title:** The Great Novel
**Authors:** Jane Author, John Writer
**Publisher:** Example Press
**Date:** 2024-01-15
**Language:** en
**Description:** An exciting tale of adventure and discovery.
**Identifier:** ISBN:978-0-123456-78-9
# Chapter 1: The Beginning
It was a dark and stormy night...
# Chapter 2: The Journey
The next morning, our hero set out...
Implementation
Converter Class : EpubConverter (_epub_converter.py)
Accepted Extensions : .epub
MIME Types : application/epub, application/epub+zip, application/x-epub+zip
Processing :
Extract META-INF/container.xml to find content.opf
Parse metadata from content.opf
Read spine order from content.opf
Convert each chapter (XHTML) to Markdown
Combine with metadata
Field Description Example titleBook title The Great GatsbyauthorsAuthor(s) F. Scott FitzgeraldlanguageLanguage code en, es, frpublisherPublisher name ScribnerdatePublication date 1925-04-10descriptionBook description A novel set in...identifierISBN or other ID ISBN:978-0-...
Jupyter Notebooks
Dependencies
No external dependencies - uses Python’s built-in json module.
Features
Converts .ipynb files to Markdown
Preserves code cells in code blocks
Includes markdown cells directly
Extracts first H1 heading as title
Handles raw cells
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "analysis.ipynb" )
print (result.title) # First H1 heading
print (result.markdown)
Input Notebook :
{
"cells" : [
{
"cell_type" : "markdown" ,
"source" : [ "# Data Analysis \n " , "This notebook analyzes..." ]
},
{
"cell_type" : "code" ,
"source" : [ "import pandas as pd \n " , "df = pd.read_csv('data.csv')" ]
},
{
"cell_type" : "markdown" ,
"source" : [ "## Results \n " , "The analysis shows..." ]
}
]
}
Output Markdown :
# Data Analysis
This notebook analyzes...
```python
import pandas as pd
df = pd.read_csv( 'data.csv' )
Results
The analysis shows…
### Implementation
- **Converter Class**: `IpynbConverter` (`_ipynb_converter.py`)
- **Accepted Extensions**: `.ipynb`
- **MIME Types**: `application/json` (with notebook content detection)
- **Cell Types**:
- `markdown`: Included directly
- `code`: Wrapped in ````python` code blocks
- `raw`: Wrapped in ```` code blocks
### Advanced Example
```python
from markitdown import MarkItDown
import json
# Convert and extract title
md = MarkItDown()
result = md.convert("analysis.ipynb")
if result.title:
print(f"Notebook: {result.title}")
# Count code cells
with open("analysis.ipynb") as f:
notebook = json.load(f)
code_cells = sum(1 for cell in notebook['cells'] if cell['cell_type'] == 'code')
print(f"Code cells: {code_cells}")
# Save markdown
with open("analysis.md", "w") as f:
f.write(result.markdown)
Plain Text Files
Dependencies
No external dependencies.
Features
Supports .txt, .text, .md, .markdown extensions
Automatic encoding detection
Preserves content as-is
Usage
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "document.txt" )
print (result.markdown) # Original text content
Implementation
Converter Class : PlainTextConverter (_plain_text_converter.py)
Accepted Extensions : .txt, .text, .md, .markdown, .json, .jsonl
MIME Types : text/*, application/json, application/markdown
Encoding : Uses charset-normalizer if charset not specified
Comparison Table
Format Dependencies Structured Output Metadata Support Nested Content CSV None ✓ (Tables) ✗ ✗ JSON None ✗ (Plain text) ✗ ✗ XML BeautifulSoup, defusedxml ✓ (RSS only) ✓ (RSS only) ✗ ZIP None ✓ (Per file) ✗ ✓ EPUB BeautifulSoup, defusedxml ✓ (Chapters) ✓ ✓ Jupyter None ✓ (Cells) ✓ ✗ Plain Text None ✗ ✗ ✗
Common Patterns
Processing Archive Contents
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "project.zip" )
# Extract individual file sections
sections = result.markdown.split( "## File: " )
for section in sections[ 1 :]: # Skip header
lines = section.split( ' \n ' )
filename = lines[ 0 ]
content = ' \n ' .join(lines[ 1 :])
print ( f "File: { filename } " )
print ( f "Content length: { len (content) } characters" )
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert( "sales_data.csv" )
# Wrap in document
report = f """# Sales Report
Generated: { datetime.now().strftime( '%Y-%m- %d ' ) }
## Data
{ result.markdown }
## Summary
Total entries: { result.markdown.count( '|' ) - 2 }
"""
with open ( "report.md" , "w" ) as f:
f.write(report)
from markitdown import MarkItDown
import re
md = MarkItDown()
result = md.convert( "book.epub" )
# Split by chapter headings
chapters = re.split( r ' \n # ' , result.markdown)
for i, chapter in enumerate (chapters[ 1 :], 1 ): # Skip metadata
lines = chapter.split( ' \n ' )
title = lines[ 0 ]
content = ' \n ' .join(lines[ 1 :])
# Save each chapter separately
with open ( f "chapter_ { i :02d} _ { title.replace( ' ' , '_' ) } .md" , "w" ) as f:
f.write( f "# { title } \n\n { content } " )
Error Handling
from markitdown import MarkItDown
from markitdown._exceptions import (
MissingDependencyException,
UnsupportedFormatException,
FileConversionException
)
md = MarkItDown()
try :
result = md.convert( "file.ext" )
print (result.markdown)
except MissingDependencyException as e:
print ( f "Missing dependency: { e } " )
print ( "Install with: pip install markitdown[all]" )
except UnsupportedFormatException as e:
print ( f "Unsupported format: { e } " )
except FileConversionException as e:
print ( f "Conversion failed: { e } " )
except Exception as e:
print ( f "Unexpected error: { e } " )
Source Code Reference
packages/markitdown/src/markitdown/converters/
├── _csv_converter.py # CSV tables
├── _plain_text_converter.py # JSON, TXT, MD
├── _rss_converter.py # XML/RSS/Atom
├── _zip_converter.py # ZIP archives
├── _epub_converter.py # EPUB books
└── _ipynb_converter.py # Jupyter notebooks
Next Steps
Format Overview See all supported formats
Python API Learn the programmatic interface