Skip to main content
Custom converters allow you to extend MarkItDown to support additional file formats. By implementing the DocumentConverter base class, you can add conversion logic for any file type.

DocumentConverter Base Class

All converters inherit from the DocumentConverter abstract base class located in _base_converter.py:42.
from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
from typing import BinaryIO, Any

class DocumentConverter:
    """Abstract superclass of all DocumentConverters."""
    
    def accepts(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs: Any) -> bool:
        """Determine if this converter can handle the document."""
        raise NotImplementedError()
    
    def convert(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs: Any) -> DocumentConverterResult:
        """Convert the document to Markdown."""
        raise NotImplementedError()

Creating a Custom Converter

1
Inherit from DocumentConverter
2
Create a new class that inherits from DocumentConverter:
3
from markitdown import DocumentConverter, DocumentConverterResult, StreamInfo
from typing import BinaryIO, Any

class MyCustomConverter(DocumentConverter):
    """Converts custom file format to Markdown."""
    pass
4
Implement the accepts() Method
5
The accepts() method determines whether your converter should handle a given file. It receives:
6
  • file_stream: The file-like object (must support seek(), tell(), and read())
  • stream_info: Metadata about the file (mimetype, extension, charset, url, etc.)
  • **kwargs: Additional keyword arguments
  • 7
    def accepts(
        self,
        file_stream: BinaryIO,
        stream_info: StreamInfo,
        **kwargs: Any,
    ) -> bool:
        mimetype = (stream_info.mimetype or "").lower()
        extension = (stream_info.extension or "").lower()
        
        # Check by extension
        if extension in [".custom", ".cst"]:
            return True
        
        # Check by mimetype
        if mimetype == "application/x-custom":
            return True
        
        return False
    
    8
    Important: If you need to read from file_stream to make a determination, you must reset the stream position before returning:
    cur_pos = file_stream.tell()  # Save current position
    data = file_stream.read(100)   # Peek at first 100 bytes
    file_stream.seek(cur_pos)      # Reset to original position
    
    9
    Implement the convert() Method
    10
    The convert() method performs the actual conversion. It receives the same parameters as accepts() and must return a DocumentConverterResult:
    11
    def convert(
        self,
        file_stream: BinaryIO,
        stream_info: StreamInfo,
        **kwargs: Any,
    ) -> DocumentConverterResult:
        # Read and decode the file
        encoding = stream_info.charset or "utf-8"
        content = file_stream.read().decode(encoding)
        
        # Convert to Markdown
        markdown = self._convert_to_markdown(content)
        
        # Extract title if available
        title = self._extract_title(content)
        
        return DocumentConverterResult(
            markdown=markdown,
            title=title
        )
    
    12
    Register the Converter
    13
    Register your converter with a MarkItDown instance:
    14
    from markitdown import MarkItDown
    
    md = MarkItDown()
    md.register_converter(MyCustomConverter())
    
    # Use it
    result = md.convert("document.custom")
    print(result.markdown)
    

    Complete Example

    Here’s a complete example based on the plain text converter (converters/_plain_text_converter.py:33):
    from typing import BinaryIO, Any
    from charset_normalizer import from_bytes
    from markitdown import (
        DocumentConverter,
        DocumentConverterResult,
        StreamInfo
    )
    
    ACCEPTED_MIME_TYPE_PREFIXES = [
        "text/",
        "application/json",
        "application/markdown",
    ]
    
    ACCEPTED_FILE_EXTENSIONS = [
        ".txt",
        ".text",
        ".md",
        ".markdown",
        ".json",
        ".jsonl",
    ]
    
    class PlainTextConverter(DocumentConverter):
        """Converts plain text files to Markdown."""
        
        def accepts(
            self,
            file_stream: BinaryIO,
            stream_info: StreamInfo,
            **kwargs: Any,
        ) -> bool:
            mimetype = (stream_info.mimetype or "").lower()
            extension = (stream_info.extension or "").lower()
            
            # If we have a charset, safely assume it's text
            if stream_info.charset is not None:
                return True
            
            # Check extension
            if extension in ACCEPTED_FILE_EXTENSIONS:
                return True
            
            # Check mimetype prefix
            for prefix in ACCEPTED_MIME_TYPE_PREFIXES:
                if mimetype.startswith(prefix):
                    return True
            
            return False
        
        def convert(
            self,
            file_stream: BinaryIO,
            stream_info: StreamInfo,
            **kwargs: Any,
        ) -> DocumentConverterResult:
            if stream_info.charset:
                text_content = file_stream.read().decode(stream_info.charset)
            else:
                text_content = str(from_bytes(file_stream.read()).best())
            
            return DocumentConverterResult(markdown=text_content)
    

    StreamInfo Object

    The StreamInfo dataclass (_stream_info.py:6) provides metadata about the file being converted:
    @dataclass(kw_only=True, frozen=True)
    class StreamInfo:
        mimetype: Optional[str] = None      # MIME type (e.g., "application/pdf")
        extension: Optional[str] = None     # File extension (e.g., ".pdf")
        charset: Optional[str] = None       # Character encoding (e.g., "utf-8")
        filename: Optional[str] = None      # Filename from path or URL
        local_path: Optional[str] = None    # Full local file path
        url: Optional[str] = None           # URL if fetched from web
    

    DocumentConverterResult

    The DocumentConverterResult class (_base_converter.py:5) wraps the conversion output:
    result = DocumentConverterResult(
        markdown="# Converted Content\n\nThis is the converted text.",
        title="Optional Document Title"
    )
    
    # Access the markdown
    print(result.markdown)
    print(str(result))  # Same as markdown
    
    # Access the title
    print(result.title)
    

    Converter Priority

    When registering converters, you can specify priority to control the order they’re tried:
    from markitdown import MarkItDown
    
    md = MarkItDown()
    
    # Higher priority (tried first)
    md.register_converter(SpecificConverter(), priority=0.0)
    
    # Lower priority (tried later)
    md.register_converter(GenericConverter(), priority=10.0)
    
    Priority values (_markitdown.py:54):
    • 0.0 - PRIORITY_SPECIFIC_FILE_FORMAT (specific converters like PDF, DOCX)
    • 10.0 - PRIORITY_GENERIC_FILE_FORMAT (generic converters like HTML, plain text)
    Lower values are tried first. Converters with the same priority maintain registration order (most recent first).

    Best Practices

    Stream Position Management: Always reset file_stream position after reading in accepts(). The convert() method expects the stream to be at the original position.
    Charset Detection: Use stream_info.charset when available, or employ libraries like charset_normalizer to detect encoding automatically.
    Dependency Handling: For optional dependencies, catch import errors gracefully and raise MissingDependencyException during conversion if needed.

    Next Steps

    Plugin Development

    Package your converter as a reusable plugin

    Configuration

    Learn about MarkItDown configuration options

    Build docs developers (and LLMs) love