Skip to main content
MarkItDown provides extensive configuration options to customize behavior, integrate with LLM services, and control converter registration.

Basic Configuration

Create a MarkItDown instance with optional configuration:
from markitdown import MarkItDown

md = MarkItDown(
    enable_builtins=True,   # Enable built-in converters (default: True)
    enable_plugins=False,   # Enable plugin converters (default: False)
    # Additional configuration options...
)

Constructor Parameters

The MarkItDown class (_markitdown.py:93) accepts the following parameters:

Core Parameters

enable_builtins
bool
default:"True"
Enable built-in converters for PDF, DOCX, XLSX, HTML, and other formats. Set to False to disable all built-ins.
enable_plugins
bool
default:"False"
Enable third-party plugin converters. Plugins must be installed and will be discovered via entry points.
requests_session
requests.Session
default:"None"
Custom requests.Session for HTTP operations. If not provided, MarkItDown creates a session with markdown preference headers.

LLM Integration

Configure LLM services for enhanced conversion (e.g., image captioning):
llm_client
Any
default:"None"
LLM client instance for generating descriptions. Compatible with OpenAI client or similar interfaces.
llm_model
str
default:"None"
Model name/identifier to use with the LLM client (e.g., "gpt-4-vision-preview").
llm_prompt
str
default:"None"
Custom prompt template for LLM requests. Used by converters that support LLM-based processing.

Converter Options

exiftool_path
str
default:"None"
Path to the exiftool binary for extracting image metadata. If not specified, MarkItDown searches common system paths and the EXIFTOOL_PATH environment variable.
style_map
str
default:"None"
Custom style map for DOCX conversion. Defines how Word styles map to Markdown formatting.

Document Intelligence

Configure Azure Document Intelligence for advanced document processing:
docintel_endpoint
str
default:"None"
Azure Document Intelligence endpoint URL. When provided, enables the Document Intelligence converter.
docintel_credential
Any
default:"None"
Azure credential for Document Intelligence authentication.
docintel_file_types
list
default:"None"
List of file types to process with Document Intelligence.
docintel_api_version
str
default:"None"
API version for Document Intelligence service.

Configuration Examples

Basic Usage with Defaults

from markitdown import MarkItDown

# Uses all defaults: built-ins enabled, plugins disabled
md = MarkItDown()
result = md.convert("document.pdf")

With LLM Integration

Enable AI-powered image captioning:
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(api_key="your-api-key")

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4-vision-preview"
)

result = md.convert("image.jpg")
print(result.markdown)  # Includes AI-generated description

With Custom Requests Session

Use a custom session for proxy, authentication, or custom headers:
import requests
from markitdown import MarkItDown

session = requests.Session()
session.headers.update({
    "User-Agent": "MyApp/1.0",
    "Authorization": "Bearer token"
})
session.proxies.update({
    "http": "http://proxy.example.com:8080",
    "https": "http://proxy.example.com:8080"
})

md = MarkItDown(requests_session=session)
result = md.convert("https://example.com/document.pdf")

With Plugins Enabled

from markitdown import MarkItDown

md = MarkItDown(enable_plugins=True)
result = md.convert("document.rtf")  # Uses plugin converter

Custom Converters Only

Disable built-ins and register only your custom converters:
from markitdown import MarkItDown
from my_converters import CustomConverter

md = MarkItDown(enable_builtins=False)
md.register_converter(CustomConverter())
result = md.convert("document.custom")

With Azure Document Intelligence

from markitdown import MarkItDown
from azure.identity import DefaultAzureCredential

md = MarkItDown(
    docintel_endpoint="https://your-resource.cognitiveservices.azure.com/",
    docintel_credential=DefaultAzureCredential(),
    docintel_file_types=[".pdf", ".docx"],
    docintel_api_version="2023-07-31"
)

result = md.convert("complex_document.pdf")

With Custom DOCX Styling

Control how Word styles map to Markdown:
from markitdown import MarkItDown

style_map = """
p[style-name='Section Heading'] => h1
p[style-name='Subsection Heading'] => h2
"""

md = MarkItDown(style_map=style_map)
result = md.convert("document.docx")

With Custom ExifTool Path

from markitdown import MarkItDown

md = MarkItDown(exiftool_path="/opt/homebrew/bin/exiftool")
result = md.convert("photo.jpg")

Dynamic Configuration

Enabling Built-ins After Creation

If you initially disable built-ins, you can enable them later:
md = MarkItDown(enable_builtins=False)
# ... do something ...
md.enable_builtins(llm_client=client, llm_model="gpt-4")

Enabling Plugins After Creation

md = MarkItDown(enable_plugins=False)
# ... do something ...
md.enable_plugins()
Both enable_builtins() and enable_plugins() can only be called once. Calling them again will raise a warning.

Per-Conversion Options

You can pass additional options to specific conversions:
md = MarkItDown()

# Override LLM settings for this conversion
result = md.convert(
    "image.jpg",
    llm_client=special_client,
    llm_model="gpt-4-turbo"
)

# Provide custom stream info
from markitdown import StreamInfo

result = md.convert(
    stream,
    stream_info=StreamInfo(
        mimetype="application/pdf",
        extension=".pdf",
        charset="utf-8"
    )
)

Environment Variables

MarkItDown checks the following environment variables:

EXIFTOOL_PATH

Specify the path to the exiftool binary:
export EXIFTOOL_PATH=/usr/local/bin/exiftool
Searched paths (in order):
  1. exiftool_path constructor parameter
  2. EXIFTOOL_PATH environment variable
  3. Common system paths:
    • /usr/bin
    • /usr/local/bin
    • /opt/bin
    • /opt/local/bin
    • /opt/homebrew/bin
    • C:\Windows\System32
    • C:\Program Files
    • C:\Program Files (x86)

Converter Registration

Manually register converters with custom priority:
from markitdown import MarkItDown
from my_converters import MyConverter

md = MarkItDown()

# Register with specific priority
md.register_converter(
    MyConverter(),
    priority=0.0  # Lower = higher priority (tried first)
)

# Built-in priorities:
# 0.0 = PRIORITY_SPECIFIC_FILE_FORMAT (PDF, DOCX, etc.)
# 10.0 = PRIORITY_GENERIC_FILE_FORMAT (HTML, plain text)

Default Requests Session

If no requests_session is provided, MarkItDown creates one with these headers (_markitdown.py:110):
{
    "Accept": "text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1"
}
This signals servers that markdown is preferred over other formats.

Configuration Object Pattern

For complex setups, use a configuration object:
from markitdown import MarkItDown
from openai import OpenAI
import os

class MarkItDownConfig:
    def __init__(self):
        self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.llm_model = "gpt-4-vision-preview"
        self.exiftool_path = "/opt/homebrew/bin/exiftool"
        self.enable_plugins = True
    
    def create_markitdown(self) -> MarkItDown:
        return MarkItDown(
            llm_client=self.openai_client,
            llm_model=self.llm_model,
            exiftool_path=self.exiftool_path,
            enable_plugins=self.enable_plugins
        )

config = MarkItDownConfig()
md = config.create_markitdown()

Best Practices

Reuse Instances: Create one MarkItDown instance and reuse it for multiple conversions. Configuration overhead is minimal after initialization.
Environment-Based Config: Use environment variables for sensitive data like API keys and service endpoints.
Plugin Security: Only enable plugins from trusted sources. Plugins execute arbitrary code during conversion.
LLM Rate Limits: When using LLM integration, be mindful of API rate limits and costs, especially for batch processing.

Next Steps

Custom Converters

Create your own document converters

Plugin Development

Build and distribute MarkItDown plugins

Build docs developers (and LLMs) love