Configuration

MarkItDown provides extensive configuration options to customize behavior, integrate with LLM services, and control converter registration.

Basic Configuration

Create a MarkItDown instance with optional configuration:

from markitdown import MarkItDown

md = MarkItDown(
    enable_builtins=True,   # Enable built-in converters (default: True)
    enable_plugins=False,   # Enable plugin converters (default: False)
    # Additional configuration options...
)

Constructor Parameters

The MarkItDown class (_markitdown.py:93) accepts the following parameters:

Core Parameters

enable_builtins

bool

default:"True"

Enable built-in converters for PDF, DOCX, XLSX, HTML, and other formats. Set to False to disable all built-ins.

enable_plugins

bool

default:"False"

Enable third-party plugin converters. Plugins must be installed and will be discovered via entry points.

requests_session

requests.Session

default:"None"

Custom requests.Session for HTTP operations. If not provided, MarkItDown creates a session with markdown preference headers.

LLM Integration

Configure LLM services for enhanced conversion (e.g., image captioning):

llm_client

Any

default:"None"

LLM client instance for generating descriptions. Compatible with OpenAI client or similar interfaces.

llm_model

str

default:"None"

Model name/identifier to use with the LLM client (e.g., "gpt-4-vision-preview").

llm_prompt

str

default:"None"

Custom prompt template for LLM requests. Used by converters that support LLM-based processing.

Converter Options

exiftool_path

str

default:"None"

Path to the exiftool binary for extracting image metadata. If not specified, MarkItDown searches common system paths and the EXIFTOOL_PATH environment variable.

style_map

str

default:"None"

Custom style map for DOCX conversion. Defines how Word styles map to Markdown formatting.

Document Intelligence

Configure Azure Document Intelligence for advanced document processing:

docintel_endpoint

str

default:"None"

Azure Document Intelligence endpoint URL. When provided, enables the Document Intelligence converter.

docintel_credential

Any

default:"None"

Azure credential for Document Intelligence authentication.

docintel_file_types

list

default:"None"

List of file types to process with Document Intelligence.

docintel_api_version

str

default:"None"

API version for Document Intelligence service.

Configuration Examples

Basic Usage with Defaults

from markitdown import MarkItDown

# Uses all defaults: built-ins enabled, plugins disabled
md = MarkItDown()
result = md.convert("document.pdf")

With LLM Integration

Enable AI-powered image captioning:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(api_key="your-api-key")

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4-vision-preview"
)

result = md.convert("image.jpg")
print(result.markdown)  # Includes AI-generated description

With Custom Requests Session

Use a custom session for proxy, authentication, or custom headers:

import requests
from markitdown import MarkItDown

session = requests.Session()
session.headers.update({
    "User-Agent": "MyApp/1.0",
    "Authorization": "Bearer token"
})
session.proxies.update({
    "http": "http://proxy.example.com:8080",
    "https": "http://proxy.example.com:8080"
})

md = MarkItDown(requests_session=session)
result = md.convert("https://example.com/document.pdf")

With Plugins Enabled

from markitdown import MarkItDown

md = MarkItDown(enable_plugins=True)
result = md.convert("document.rtf")  # Uses plugin converter

Custom Converters Only

Disable built-ins and register only your custom converters:

from markitdown import MarkItDown
from my_converters import CustomConverter

md = MarkItDown(enable_builtins=False)
md.register_converter(CustomConverter())
result = md.convert("document.custom")

With Azure Document Intelligence

from markitdown import MarkItDown
from azure.identity import DefaultAzureCredential

md = MarkItDown(
    docintel_endpoint="https://your-resource.cognitiveservices.azure.com/",
    docintel_credential=DefaultAzureCredential(),
    docintel_file_types=[".pdf", ".docx"],
    docintel_api_version="2023-07-31"
)

result = md.convert("complex_document.pdf")

With Custom DOCX Styling

Control how Word styles map to Markdown:

from markitdown import MarkItDown

style_map = """
p[style-name='Section Heading'] => h1
p[style-name='Subsection Heading'] => h2
"""

md = MarkItDown(style_map=style_map)
result = md.convert("document.docx")

With Custom ExifTool Path

from markitdown import MarkItDown

md = MarkItDown(exiftool_path="/opt/homebrew/bin/exiftool")
result = md.convert("photo.jpg")

Dynamic Configuration

Enabling Built-ins After Creation

If you initially disable built-ins, you can enable them later:

md = MarkItDown(enable_builtins=False)
# ... do something ...
md.enable_builtins(llm_client=client, llm_model="gpt-4")

Enabling Plugins After Creation

md = MarkItDown(enable_plugins=False)
# ... do something ...
md.enable_plugins()

Both enable_builtins() and enable_plugins() can only be called once. Calling them again will raise a warning.

Per-Conversion Options

You can pass additional options to specific conversions:

md = MarkItDown()

# Override LLM settings for this conversion
result = md.convert(
    "image.jpg",
    llm_client=special_client,
    llm_model="gpt-4-turbo"
)

# Provide custom stream info
from markitdown import StreamInfo

result = md.convert(
    stream,
    stream_info=StreamInfo(
        mimetype="application/pdf",
        extension=".pdf",
        charset="utf-8"
    )
)

Environment Variables

MarkItDown checks the following environment variables:

EXIFTOOL_PATH

Specify the path to the exiftool binary:

export EXIFTOOL_PATH=/usr/local/bin/exiftool

Searched paths (in order):

exiftool_path constructor parameter
EXIFTOOL_PATH environment variable
Common system paths:
- /usr/bin
- /usr/local/bin
- /opt/bin
- /opt/local/bin
- /opt/homebrew/bin
- C:\Windows\System32
- C:\Program Files
- C:\Program Files (x86)

Converter Registration

Manually register converters with custom priority:

from markitdown import MarkItDown
from my_converters import MyConverter

md = MarkItDown()

# Register with specific priority
md.register_converter(
    MyConverter(),
    priority=0.0  # Lower = higher priority (tried first)
)

# Built-in priorities:
# 0.0 = PRIORITY_SPECIFIC_FILE_FORMAT (PDF, DOCX, etc.)
# 10.0 = PRIORITY_GENERIC_FILE_FORMAT (HTML, plain text)

Default Requests Session

If no requests_session is provided, MarkItDown creates one with these headers (_markitdown.py:110):

{
    "Accept": "text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1"
}

This signals servers that markdown is preferred over other formats.

Configuration Object Pattern

For complex setups, use a configuration object:

from markitdown import MarkItDown
from openai import OpenAI
import os

class MarkItDownConfig:
    def __init__(self):
        self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.llm_model = "gpt-4-vision-preview"
        self.exiftool_path = "/opt/homebrew/bin/exiftool"
        self.enable_plugins = True
    
    def create_markitdown(self) -> MarkItDown:
        return MarkItDown(
            llm_client=self.openai_client,
            llm_model=self.llm_model,
            exiftool_path=self.exiftool_path,
            enable_plugins=self.enable_plugins
        )

config = MarkItDownConfig()
md = config.create_markitdown()

Best Practices

Reuse Instances: Create one MarkItDown instance and reuse it for multiple conversions. Configuration overhead is minimal after initialization.

Environment-Based Config: Use environment variables for sensitive data like API keys and service endpoints.

Plugin Security: Only enable plugins from trusted sources. Plugins execute arbitrary code during conversion.

LLM Rate Limits: When using LLM integration, be mindful of API rate limits and costs, especially for batch processing.

Get Started

Guides

File Formats

Advanced

Basic Configuration

Constructor Parameters

Core Parameters

LLM Integration

Converter Options

Document Intelligence

Configuration Examples

Basic Usage with Defaults

With LLM Integration

With Custom Requests Session

With Plugins Enabled

Custom Converters Only

With Azure Document Intelligence

With Custom DOCX Styling

With Custom ExifTool Path

Dynamic Configuration

Enabling Built-ins After Creation

Enabling Plugins After Creation

Per-Conversion Options

Environment Variables

EXIFTOOL_PATH

Converter Registration

Default Requests Session

Configuration Object Pattern

Best Practices

Next Steps

Custom Converters

Plugin Development

Build docs developers (and LLMs) love

Get Started

Guides

File Formats

Advanced

​Basic Configuration

​Constructor Parameters

​Core Parameters

​LLM Integration

​Converter Options

​Document Intelligence

​Configuration Examples

​Basic Usage with Defaults

​With LLM Integration

​With Custom Requests Session

​With Plugins Enabled

​Custom Converters Only

​With Azure Document Intelligence

​With Custom DOCX Styling

​With Custom ExifTool Path

​Dynamic Configuration

​Enabling Built-ins After Creation

​Enabling Plugins After Creation

​Per-Conversion Options

​Environment Variables

​EXIFTOOL_PATH

​Converter Registration

​Default Requests Session

​Configuration Object Pattern

​Best Practices

​Next Steps

Custom Converters

Plugin Development

Build docs developers (and LLMs) love

Basic Configuration

Constructor Parameters

Core Parameters

LLM Integration

Converter Options

Document Intelligence

Configuration Examples

Basic Usage with Defaults

With LLM Integration

With Custom Requests Session

With Plugins Enabled

Custom Converters Only

With Azure Document Intelligence

With Custom DOCX Styling

With Custom ExifTool Path

Dynamic Configuration

Enabling Built-ins After Creation

Enabling Plugins After Creation

Per-Conversion Options

Environment Variables

EXIFTOOL_PATH

Converter Registration

Default Requests Session

Configuration Object Pattern

Best Practices

Next Steps