LLM Integration

MarkItDown can integrate with Large Language Models (LLMs) to generate detailed descriptions for images during conversion, making the output more accessible and useful for text analysis.

Overview

When an LLM client is configured, MarkItDown:

Extracts image metadata using ExifTool (if available)
Encodes the image as a base64 data URI
Sends the image to the LLM with a prompt
Includes the AI-generated description in the Markdown output

LLM integration is currently supported for JPEG and PNG images only.

Prerequisites

Install OpenAI Package

pip install openai

The openai package is not included in MarkItDown’s dependencies and must be installed separately.

Get API Key

Obtain an API key from OpenAI or your LLM provider

Optional: Install ExifTool

For enhanced image metadata extraction:

# macOS
brew install exiftool

# Ubuntu/Debian
sudo apt-get install libimage-exiftool-perl

Basic Usage

Python API

from markitdown import MarkItDown
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key="your-api-key")

# Create MarkItDown with LLM integration
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o"
)

# Convert an image
result = md.convert("photo.jpg")
print(result.markdown)

Output:

ImageSize: 1920x1080
DateTimeOriginal: 2024:02:28 10:30:00
GPSPosition: 37.7749° N, 122.4194° W

# Description:
A scenic view of the Golden Gate Bridge during sunset, with vibrant orange and pink hues reflecting off the water. The iconic suspension bridge spans across the bay, with the city of San Francisco visible in the background.

Configuration Options

LLM Client

Any OpenAI-compatible client that supports the chat completions API:

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o"
)

LLM Model

Specify the model to use for image descriptions:

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",              # GPT-4 with vision
    # llm_model="gpt-4o-mini",       # Faster, cheaper
    # llm_model="gpt-4-turbo",       # Previous generation
)

Supported models (vision-capable):

gpt-4o - Recommended, high quality
gpt-4o-mini - Faster, more cost-effective
gpt-4-turbo - Previous generation
gpt-4-vision-preview - Legacy

Custom Prompt

Customize the prompt sent to the LLM:

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe this image in detail, focusing on key objects, colors, and composition."
)

Default prompt:

Write a detailed caption for this image.

Per-Conversion Override

Override settings for individual conversions:

md = MarkItDown()

# Convert with LLM
result = md.convert(
    "photo.jpg",
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe the main subject of this image."
)

Implementation Details

Image Processing

The LLM integration in _image_converter.py:

def _get_llm_description(
    self,
    file_stream: BinaryIO,
    stream_info: StreamInfo,
    *,
    client,
    model,
    prompt=None,
) -> Union[None, str]:
    if prompt is None or prompt.strip() == "":
        prompt = "Write a detailed caption for this image."

    # Get MIME type
    content_type = stream_info.mimetype
    if not content_type:
        content_type, _ = mimetypes.guess_type("_dummy" + (stream_info.extension or ""))
    if not content_type:
        content_type = "application/octet-stream"

    # Encode image as base64
    cur_pos = file_stream.tell()
    try:
        base64_image = base64.b64encode(file_stream.read()).decode("utf-8")
    finally:
        file_stream.seek(cur_pos)

    # Create data URI
    data_uri = f"data:{content_type};base64,{base64_image}"

    # Call OpenAI API
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {"url": data_uri},
                },
            ],
        }
    ]

    response = client.chat.completions.create(model=model, messages=messages)
    return response.choices[0].message.content

Supported Image Types

LLM descriptions are generated for:

.jpg, .jpeg (JPEG images)
.png (PNG images)

Other image formats are not currently supported.

Advanced Examples

Multiple Images with Different Prompts

from markitdown import MarkItDown
from openai import OpenAI
from pathlib import Path

client = OpenAI(api_key="your-api-key")
md = MarkItDown(llm_client=client, llm_model="gpt-4o")

image_types = {
    "product": "Describe this product image for an e-commerce catalog.",
    "portrait": "Describe this portrait photo, including mood and setting.",
    "landscape": "Describe this landscape, highlighting natural features.",
}

for image_file in Path("images").glob("*.jpg"):
    # Determine image type from filename or metadata
    image_type = "landscape"  # Example
    prompt = image_types.get(image_type, "Describe this image in detail.")
    
    result = md.convert(
        str(image_file),
        llm_prompt=prompt
    )
    
    output_file = Path("output") / f"{image_file.stem}.md"
    output_file.write_text(result.markdown)

Accessibility Captions

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(api_key="your-api-key")

accessibility_prompt = """
Create an accessible description of this image for screen reader users.
Include:
- Main subject and action
- Important details and context
- Colors if relevant
- Spatial relationships
Keep it concise but informative.
"""

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt=accessibility_prompt
)

result = md.convert("diagram.png")
print(result.markdown)

Content Moderation

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(api_key="your-api-key")

moderation_prompt = """
Describe this image and indicate if it contains:
- Inappropriate content
- Sensitive information
- Brand logos or trademarks
Provide a neutral, factual description.
"""

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt=moderation_prompt
)

result = md.convert("user_upload.jpg")
if "inappropriate" in result.markdown.lower():
    print("Warning: Image may contain inappropriate content")

Custom OpenAI Client Configuration

from openai import OpenAI
from markitdown import MarkItDown

# Custom timeout and retry settings
client = OpenAI(
    api_key="your-api-key",
    timeout=30.0,
    max_retries=3,
)

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o"
)

Azure OpenAI

from openai import AzureOpenAI
from markitdown import MarkItDown

# Azure OpenAI client
client = AzureOpenAI(
    api_key="your-azure-api-key",
    api_version="2024-02-01",
    azure_endpoint="https://your-resource.openai.azure.com/"
)

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o"  # Your Azure deployment name
)

result = md.convert("image.jpg")

Alternative LLM Providers

Any OpenAI-compatible API:

from openai import OpenAI
from markitdown import MarkItDown

# Example: Using a compatible API endpoint
client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.alternative-provider.com/v1"
)

md = MarkItDown(
    llm_client=client,
    llm_model="vision-model-name"
)

Cost Considerations

LLM API calls incur costs. Each image conversion with LLM integration makes one API request.

Cost factors:

Model choice: gpt-4o-mini is more cost-effective than gpt-4o
Image size: Larger images consume more tokens
Prompt length: Longer prompts increase costs
Frequency: Each image conversion = one API call

Estimated costs (as of 2024):

gpt-4o: ~$0.01-0.02 per image
gpt-4o-mini: ~$0.001-0.002 per image

Minimize Costs

from markitdown import MarkItDown
from openai import OpenAI
from pathlib import Path

client = OpenAI(api_key="your-api-key")

# Only use LLM for specific images
md_with_llm = MarkItDown(llm_client=client, llm_model="gpt-4o-mini")
md_without_llm = MarkItDown()

for image_file in Path("images").glob("*.jpg"):
    # Use LLM only for important images
    if "important" in image_file.stem:
        result = md_with_llm.convert(str(image_file))
    else:
        result = md_without_llm.convert(str(image_file))

Error Handling

from markitdown import MarkItDown
from openai import OpenAI, OpenAIError

client = OpenAI(api_key="your-api-key")
md = MarkItDown(llm_client=client, llm_model="gpt-4o")

try:
    result = md.convert("image.jpg")
except OpenAIError as e:
    print(f"OpenAI API error: {e}")
    # Fall back to conversion without LLM
    md_fallback = MarkItDown()
    result = md_fallback.convert("image.jpg")
except Exception as e:
    print(f"Conversion error: {e}")

Without LLM Integration

If no LLM client is configured, image conversion includes only metadata:

from markitdown import MarkItDown

md = MarkItDown()  # No LLM client
result = md.convert("photo.jpg")
print(result.markdown)

Output:

ImageSize: 1920x1080
DateTimeOriginal: 2024:02:28 10:30:00
GPSPosition: 37.7749° N, 122.4194° W

Troubleshooting

OpenAI Package Not Installed

ImportError: No module named 'openai'

Solution:

pip install openai

Invalid API Key

AuthenticationError: Invalid API key

Check:

API key is correct
API key has not expired
Account has available credits

Model Not Found

NotFoundError: Model 'gpt-4o' not found

Ensure:

Model name is correct
Your account has access to the model
Using correct API endpoint (OpenAI vs Azure)

No Description Generated

If the description is missing:

# Check if LLM client and model are set
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o"  # Must be specified
)

# Verify image format is supported
result = md.convert("image.jpg")  # .jpg or .png only

Best Practices

Use gpt-4o-mini for cost-effective descriptions
Customize prompts for your specific use case
Handle API errors gracefully with fallbacks
Consider caching results for repeated conversions
Monitor API usage and costs
Use appropriate image sizes (resize large images if needed)

Get Started

Guides

File Formats

Advanced

Overview

Prerequisites

Basic Usage

Python API

Configuration Options

LLM Client

LLM Model

Custom Prompt

Per-Conversion Override

Implementation Details

Image Processing

Supported Image Types

Advanced Examples

Multiple Images with Different Prompts

Accessibility Captions

Content Moderation

Custom OpenAI Client Configuration

Azure OpenAI

Alternative LLM Providers

Cost Considerations

Minimize Costs

Error Handling

Without LLM Integration

Troubleshooting

OpenAI Package Not Installed

Invalid API Key

Model Not Found

No Description Generated

Best Practices

Build docs developers (and LLMs) love

Get Started

Guides

File Formats

Advanced

​Overview

​Prerequisites

​Basic Usage

​Python API

​Configuration Options

​LLM Client

​LLM Model

​Custom Prompt

​Per-Conversion Override

​Implementation Details

​Image Processing

​Supported Image Types

​Advanced Examples

​Multiple Images with Different Prompts

​Accessibility Captions

​Content Moderation

​Custom OpenAI Client Configuration

​Azure OpenAI

​Alternative LLM Providers

​Cost Considerations

​Minimize Costs

​Error Handling

​Without LLM Integration

​Troubleshooting

​OpenAI Package Not Installed

​Invalid API Key

​Model Not Found

​No Description Generated

​Best Practices

Build docs developers (and LLMs) love

Overview

Prerequisites

Basic Usage

Python API

Configuration Options

LLM Client

LLM Model

Custom Prompt

Per-Conversion Override

Implementation Details

Image Processing

Supported Image Types

Advanced Examples

Multiple Images with Different Prompts

Accessibility Captions

Content Moderation

Custom OpenAI Client Configuration

Azure OpenAI

Alternative LLM Providers

Cost Considerations

Minimize Costs

Error Handling

Without LLM Integration

Troubleshooting

OpenAI Package Not Installed

Invalid API Key

Model Not Found

No Description Generated

Best Practices