MarkItDown’s plugin system allows you to extend its capabilities with custom document converters for file formats not supported by default.
Using Plugins
Enabling Plugins
Plugins are disabled by default and must be explicitly enabled.
markitdown --use-plugins file.rtf
markitdown -p file.rtf
Listing Installed Plugins
Check which plugins are installed:
markitdown --list-plugins
Output:
Installed MarkItDown 3rd-party Plugins:
* sample_plugin (package: markitdown_sample_plugin)
Use the -p (or --use-plugins) option to enable 3rd-party plugins.
If no plugins are installed:
Installed MarkItDown 3rd-party Plugins:
* No 3rd-party plugins installed.
Find plugins by searching for the hashtag #markitdown-plugin on GitHub.
Finding Plugins
Discover available plugins:
Check PyPI
Search PyPI for packages starting with markitdown-:
Installing Plugins
Plugins are installed as Python packages:
# From PyPI
pip install markitdown-sample-plugin
# From GitHub
pip install git+https://github.com/user/markitdown-plugin-name.git
# From local directory
pip install -e /path/to/plugin
Verify installation:
markitdown --list-plugins
Creating Plugins
Plugin Structure
A MarkItDown plugin is a Python package that implements a specific interface:
Create a DocumentConverter
from typing import BinaryIO, Any
from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
class RtfConverter(DocumentConverter):
def accepts(
self,
file_stream: BinaryIO,
stream_info: StreamInfo,
**kwargs: Any,
) -> bool:
"""Check if this converter can handle the file."""
extension = (stream_info.extension or "").lower()
mimetype = (stream_info.mimetype or "").lower()
if extension == ".rtf":
return True
if mimetype == "text/rtf":
return True
return False
def convert(
self,
file_stream: BinaryIO,
stream_info: StreamInfo,
**kwargs: Any,
) -> DocumentConverterResult:
"""Convert the file to Markdown."""
# Read the RTF content
content = file_stream.read()
# Convert to Markdown (simplified example)
markdown = self._rtf_to_markdown(content)
return DocumentConverterResult(
markdown=markdown,
title="RTF Document"
)
def _rtf_to_markdown(self, content: bytes) -> str:
# Implement RTF parsing logic
from striprtf.striprtf import rtf_to_text
text = rtf_to_text(content.decode('utf-8'))
return text
Create Plugin Interface
from .converter import RtfConverter
from markitdown import MarkItDown
# Plugin interface version
__plugin_interface_version__ = 1
def register_converters(markitdown: MarkItDown, **kwargs):
"""Register converters with MarkItDown instance."""
markitdown.register_converter(RtfConverter())
Configure Entry Point
[project]
name = "markitdown-rtf-plugin"
version = "0.1.0"
dependencies = [
"markitdown>=0.1.0",
"striprtf",
]
[project.entry-points."markitdown.plugin"]
rtf_plugin = "markitdown_rtf_plugin"
Entry Point Configuration
The entry point is critical for plugin discovery:
[project.entry-points."markitdown.plugin"]
plugin_name = "package_name"
- Entry point group: Must be
"markitdown.plugin"
- Plugin name: Any unique identifier (e.g.,
rtf_plugin)
- Package name: The fully qualified package name (e.g.,
markitdown_rtf_plugin)
Plugin Interface Version
Your plugin must export the interface version:
__plugin_interface_version__ = 1
Currently, only version 1 is supported.
Registration Function
Implement the register_converters function:
def register_converters(markitdown: MarkItDown, **kwargs):
"""
Called when MarkItDown instances are created with plugins enabled.
Args:
markitdown: The MarkItDown instance to register converters with
**kwargs: Additional arguments passed to MarkItDown constructor
"""
# Register one or more converters
markitdown.register_converter(MyConverter())
markitdown.register_converter(AnotherConverter())
Advanced Plugin Development
Converter Priority
Control when your converter is tried:
from markitdown import PRIORITY_SPECIFIC_FILE_FORMAT, PRIORITY_GENERIC_FILE_FORMAT
def register_converters(markitdown: MarkItDown, **kwargs):
# High priority (tried first) - for specific file types
markitdown.register_converter(
RtfConverter(),
priority=PRIORITY_SPECIFIC_FILE_FORMAT # 0.0
)
# Lower priority (tried later) - for generic file types
markitdown.register_converter(
GenericTextConverter(),
priority=PRIORITY_GENERIC_FILE_FORMAT # 10.0
)
Lower priority values are tried first. Built-in converters use 0.0 for specific formats and 10.0 for generic formats.
Accessing File Content
The file_stream is seekable:
def accepts(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs) -> bool:
# Save position
cur_pos = file_stream.tell()
# Read header to check file type
header = file_stream.read(100)
# IMPORTANT: Reset position
file_stream.seek(cur_pos)
return header.startswith(b'{\\rtf')
Always reset the file stream position after reading in accepts(). The convert() method expects the stream to be at the original position.
Using Configuration Options
Access configuration passed to MarkItDown:
def register_converters(markitdown: MarkItDown, **kwargs):
# Access custom configuration
custom_setting = kwargs.get('custom_setting', 'default')
markitdown.register_converter(
MyConverter(setting=custom_setting)
)
Pass configuration when creating MarkItDown:
md = MarkItDown(
enable_plugins=True,
custom_setting='value'
)
Error Handling
Handle missing dependencies gracefully:
from markitdown import MissingDependencyException
import sys
_dependency_exc_info = None
try:
import striprtf
except ImportError:
_dependency_exc_info = sys.exc_info()
class RtfConverter(DocumentConverter):
def __init__(self):
if _dependency_exc_info is not None:
raise MissingDependencyException(
"RtfConverter requires 'striprtf' to be installed. "
"Install with: pip install striprtf"
) from _dependency_exc_info[1].with_traceback(_dependency_exc_info[2])
Example: Sample Plugin
The official sample plugin demonstrates best practices:
# From markitdown-sample-plugin
from typing import BinaryIO, Any
from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
import sys
# Check for dependencies
_dependency_exc_info = None
try:
from striprtf.striprtf import rtf_to_text
except ImportError:
_dependency_exc_info = sys.exc_info()
class RtfConverter(DocumentConverter):
def accepts(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs: Any) -> bool:
extension = (stream_info.extension or "").lower()
if extension == ".rtf":
return True
# Check file magic
cur_pos = file_stream.tell()
header = file_stream.read(100)
file_stream.seek(cur_pos)
return header.startswith(b'{\\\\rtf')
def convert(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs: Any) -> DocumentConverterResult:
if _dependency_exc_info is not None:
raise MissingDependencyException(
"RTF conversion requires 'striprtf'. Install with: pip install striprtf"
)
content = file_stream.read().decode('utf-8', errors='ignore')
text = rtf_to_text(content)
return DocumentConverterResult(markdown=text)
# Plugin interface
__plugin_interface_version__ = 1
def register_converters(markitdown, **kwargs):
markitdown.register_converter(RtfConverter())
Install and use:
pip install markitdown-sample-plugin
markitdown --use-plugins document.rtf
Testing Plugins
Test your plugin:
from markitdown import MarkItDown
import io
def test_rtf_conversion():
md = MarkItDown(enable_plugins=True)
# Create test RTF content
rtf_content = b"{\\rtf1 Hello World}"
stream = io.BytesIO(rtf_content)
result = md.convert_stream(stream, stream_info=StreamInfo(extension=".rtf"))
assert "Hello World" in result.markdown
print("✓ Plugin test passed")
if __name__ == "__main__":
test_rtf_conversion()
Publishing Plugins
Test Locally
pip install dist/markitdown_rtf_plugin-0.1.0-py3-none-any.whl
markitdown --list-plugins
Publish to PyPI
python -m twine upload dist/*
Tag Repository
Add #markitdown-plugin topic to your GitHub repository for discoverability
Security Considerations
Plugins execute arbitrary code during conversion. Only install plugins from trusted sources.
Best practices:
- Review plugin source code before installation
- Use virtual environments for testing new plugins
- Keep plugins updated
- Report security issues to plugin authors
Troubleshooting
Plugin Not Found
If --list-plugins doesn’t show your plugin:
# Check if package is installed
pip list | grep markitdown
# Verify entry points
python -c "from importlib.metadata import entry_points; print(list(entry_points(group='markitdown.plugin')))"
# Reinstall the plugin
pip uninstall markitdown-rtf-plugin
pip install markitdown-rtf-plugin
Plugin Fails to Load
Check for errors:
import warnings
import traceback
from importlib.metadata import entry_points
for ep in entry_points(group='markitdown.plugin'):
try:
plugin = ep.load()
print(f"✓ Loaded: {ep.name}")
except Exception as e:
print(f"✗ Failed: {ep.name}")
traceback.print_exc()
Converter Not Called
Ensure accepts() returns True:
def accepts(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs) -> bool:
print(f"Checking: {stream_info.extension} / {stream_info.mimetype}")
return stream_info.extension == ".rtf"