Skip to main content
MarkItDown extracts metadata from image files and can optionally generate detailed descriptions using multimodal LLMs.

Supported Formats

  • JPEG: .jpg, .jpeg
  • PNG: .png

Dependencies

Core (No Dependencies)

Basic image conversion works without any dependencies, though metadata extraction requires exiftool.

Optional: EXIF Metadata

# macOS
brew install exiftool

# Ubuntu/Debian  
sudo apt-get install libimage-exiftool-perl

# Windows (download from)
https://exiftool.org/
Security: MarkItDown requires ExifTool version 12.24 or later to avoid CVE-2021-22204. The converter will verify the version before use.

Optional: LLM Captioning

pip install openai  # Or other LLM client

Basic Usage

from markitdown import MarkItDown

md = MarkItDown(exiftool_path="/usr/local/bin/exiftool")
result = md.convert("photo.jpg")
print(result.markdown)

Features

EXIF Metadata

Extract camera settings, dates, GPS coordinates

LLM Descriptions

Generate detailed image captions with multimodal LLMs

Embedded Metadata

Extract title, caption, description, keywords, artist

GPS Data

Extract geolocation information

Output Examples

Metadata Only

ImageSize: 4032x3024
DateTimeOriginal: 2024:02:15 14:30:25
Artist: John Doe
Description: Sunset at the beach
GPSPosition: 34.0522 N, 118.2437 W

With LLM Description

ImageSize: 1920x1080
DateTimeOriginal: 2024:02:15 10:15:00
Keywords: landscape, mountain, nature

# Description:
A breathtaking mountain landscape at golden hour. Snow-capped peaks rise majestically against a vibrant orange and pink sky. In the foreground, a winding river reflects the colorful sunset, with evergreen trees lining both banks. The composition captures the serenity and grandeur of alpine wilderness.

EXIF Metadata Fields

The converter extracts the following EXIF fields (when available):
FieldDescriptionExample
ImageSizeDimensions in pixels4032x3024
TitleImage titleVacation Photo
CaptionShort captionFamily at beach
DescriptionLong descriptionSummer vacation...
KeywordsComma-separated tagsbeach, sunset, ocean
ArtistPhotographer nameJane Smith
AuthorAuthor/creatorJohn Doe
DateTimeOriginalWhen photo was taken2024:02:15 14:30:25
CreateDateFile creation date2024:02:15 14:30:25
GPSPositionGPS coordinates34.0522 N, 118.2437 W

LLM Integration

Using OpenAI

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(api_key="your-api-key")
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Write a detailed caption for this image."
)

result = md.convert("photo.jpg")

Custom Prompts

md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o"
)

result = md.convert(
    "photo.jpg",
    llm_prompt="Describe this image focusing on colors, composition, and mood. Be detailed."
)
Default prompt: "Write a detailed caption for this image."

Using Other LLM Providers

from anthropic import Anthropic
from markitdown import MarkItDown

# Any client with a compatible chat.completions API works
client = Anthropic(api_key="your-api-key")
md = MarkItDown(llm_client=client, llm_model="claude-3-opus")

result = md.convert("photo.jpg")

Implementation Details

Source Location

packages/markitdown/src/markitdown/converters/
├── _image_converter.py  # Main image converter
└── _exiftool.py         # ExifTool metadata extraction

Converter Class

  • Class Name: ImageConverter
  • Accepted Extensions: .jpg, .jpeg, .png
  • MIME Types: image/jpeg, image/png

ExifTool Integration

The exiftool_metadata() function in _exiftool.py:
def exiftool_metadata(
    file_stream: BinaryIO,
    *,
    exiftool_path: Union[str, None],
) -> dict:
    # Verify ExifTool version >= 12.24 (CVE-2021-22204)
    # Run: exiftool -json -
    # Returns: dict of metadata fields
Security Check:
version = subprocess.run([exiftool_path, "-ver"], ...)
if version < (12, 24):
    raise RuntimeError("ExifTool version is vulnerable to CVE-2021-22204")

LLM Description Process

  1. Convert image to base64
  2. Create data URI: data:image/jpeg;base64,...
  3. Send to LLM with prompt
  4. Return generated description
def _get_llm_description(self, file_stream, stream_info, *, client, model, prompt):
    base64_image = base64.b64encode(file_stream.read()).decode("utf-8")
    data_uri = f"data:{content_type};base64,{base64_image}"
    
    messages = [{
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url": data_uri}}
        ]
    }]
    
    response = client.chat.completions.create(model=model, messages=messages)
    return response.choices[0].message.content

Advanced Examples

Batch Processing Images

from markitdown import MarkItDown
from openai import OpenAI
import os

client = OpenAI()
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    exiftool_path="/usr/local/bin/exiftool"
)

image_dir = "photos"
for filename in os.listdir(image_dir):
    if filename.lower().endswith(('.jpg', '.jpeg', '.png')):
        filepath = os.path.join(image_dir, filename)
        result = md.convert(filepath)
        
        # Save markdown
        output_path = filepath.replace(os.path.splitext(filepath)[1], '.md')
        with open(output_path, 'w') as f:
            f.write(result.markdown)

Extract Only Metadata

from markitdown import MarkItDown

# No LLM client = metadata only
md = MarkItDown(exiftool_path="/usr/local/bin/exiftool")
result = md.convert("photo.jpg")

# Parse metadata from markdown
for line in result.markdown.split('\n'):
    if ':' in line:
        key, value = line.split(':', 1)
        print(f"{key.strip()}: {value.strip()}")

Custom Metadata Processing

from markitdown.converters import exiftool_metadata

with open('photo.jpg', 'rb') as f:
    metadata = exiftool_metadata(
        f,
        exiftool_path="/usr/local/bin/exiftool"
    )
    
    # Direct access to metadata dict
    print(f"Camera: {metadata.get('Make')} {metadata.get('Model')}")
    print(f"ISO: {metadata.get('ISO')}")
    print(f"Aperture: {metadata.get('Aperture')}")
    print(f"Shutter Speed: {metadata.get('ShutterSpeed')}")

Error Handling

from markitdown import MarkItDown

md = MarkItDown(exiftool_path="/usr/local/bin/exiftool")

try:
    result = md.convert("photo.jpg")
    if not result.markdown.strip():
        print("No metadata or description generated")
except FileNotFoundError:
    print("exiftool not found at specified path")
except RuntimeError as e:
    if "CVE-2021-22204" in str(e):
        print("Please upgrade exiftool to version 12.24 or later")
    else:
        raise
except Exception as e:
    print(f"Error processing image: {e}")

Use Cases

Generate markdown catalogs of photo collections with metadata and AI-generated descriptions for searchability.
Extract and index metadata from image libraries for better organization and retrieval.
Generate alt text and detailed descriptions for web images using LLM captioning.
Integrate into data processing workflows to extract technical and descriptive information from images.
Extract EXIF data including GPS coordinates and timestamps for forensic or legal purposes.

Limitations

  • No OCR: Text within images is not extracted (consider using Document Intelligence for OCR)
  • LLM Accuracy: AI-generated descriptions may contain hallucinations or inaccuracies
  • Format Support: Only JPEG and PNG; no support for TIFF, GIF, WebP, etc.
  • EXIF Dependency: Metadata extraction requires external exiftool binary

Next Steps

Document Intelligence

Use Azure Document Intelligence for OCR and document analysis

PowerPoint Images

Images in PPTX files also support LLM captioning

Build docs developers (and LLMs) love