MarkItDown extracts metadata from image files and can optionally generate detailed descriptions using multimodal LLMs.
JPEG : .jpg, .jpeg
PNG : .png
Dependencies
Core (No Dependencies)
Basic image conversion works without any dependencies, though metadata extraction requires exiftool.
# macOS
brew install exiftool
# Ubuntu/Debian
sudo apt-get install libimage-exiftool-perl
# Windows (download from)
https://exiftool.org/
Security : MarkItDown requires ExifTool version 12.24 or later to avoid CVE-2021-22204. The converter will verify the version before use.
Optional: LLM Captioning
pip install openai # Or other LLM client
Basic Usage
Python (Metadata Only)
Python (With LLM Captions)
CLI
from markitdown import MarkItDown
md = MarkItDown( exiftool_path = "/usr/local/bin/exiftool" )
result = md.convert( "photo.jpg" )
print (result.markdown)
Features
EXIF Metadata Extract camera settings, dates, GPS coordinates
LLM Descriptions Generate detailed image captions with multimodal LLMs
Embedded Metadata Extract title, caption, description, keywords, artist
GPS Data Extract geolocation information
Output Examples
ImageSize: 4032x3024
DateTimeOriginal: 2024:02:15 14:30:25
Artist: John Doe
Description: Sunset at the beach
GPSPosition: 34.0522 N, 118.2437 W
With LLM Description
ImageSize: 1920x1080
DateTimeOriginal: 2024:02:15 10:15:00
Keywords: landscape, mountain, nature
# Description:
A breathtaking mountain landscape at golden hour. Snow-capped peaks rise majestically against a vibrant orange and pink sky. In the foreground, a winding river reflects the colorful sunset, with evergreen trees lining both banks. The composition captures the serenity and grandeur of alpine wilderness.
The converter extracts the following EXIF fields (when available):
Field Description Example ImageSizeDimensions in pixels 4032x3024TitleImage title Vacation PhotoCaptionShort caption Family at beachDescriptionLong description Summer vacation...KeywordsComma-separated tags beach, sunset, oceanArtistPhotographer name Jane SmithAuthorAuthor/creator John DoeDateTimeOriginalWhen photo was taken 2024:02:15 14:30:25CreateDateFile creation date 2024:02:15 14:30:25GPSPositionGPS coordinates 34.0522 N, 118.2437 W
LLM Integration
Using OpenAI
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI( api_key = "your-api-key" )
md = MarkItDown(
llm_client = client,
llm_model = "gpt-4o" ,
llm_prompt = "Write a detailed caption for this image."
)
result = md.convert( "photo.jpg" )
Custom Prompts
md = MarkItDown(
llm_client = client,
llm_model = "gpt-4o"
)
result = md.convert(
"photo.jpg" ,
llm_prompt = "Describe this image focusing on colors, composition, and mood. Be detailed."
)
Default prompt: "Write a detailed caption for this image."
Using Other LLM Providers
from anthropic import Anthropic
from markitdown import MarkItDown
# Any client with a compatible chat.completions API works
client = Anthropic( api_key = "your-api-key" )
md = MarkItDown( llm_client = client, llm_model = "claude-3-opus" )
result = md.convert( "photo.jpg" )
Implementation Details
Source Location
packages/markitdown/src/markitdown/converters/
├── _image_converter.py # Main image converter
└── _exiftool.py # ExifTool metadata extraction
Converter Class
Class Name : ImageConverter
Accepted Extensions : .jpg, .jpeg, .png
MIME Types : image/jpeg, image/png
The exiftool_metadata() function in _exiftool.py:
def exiftool_metadata (
file_stream : BinaryIO,
* ,
exiftool_path : Union[ str , None ],
) -> dict :
# Verify ExifTool version >= 12.24 (CVE-2021-22204)
# Run: exiftool -json -
# Returns: dict of metadata fields
Security Check :
version = subprocess.run([exiftool_path, "-ver" ], ... )
if version < ( 12 , 24 ):
raise RuntimeError ( "ExifTool version is vulnerable to CVE-2021-22204" )
LLM Description Process
Convert image to base64
Create data URI: data:image/jpeg;base64,...
Send to LLM with prompt
Return generated description
def _get_llm_description ( self , file_stream , stream_info , * , client , model , prompt ):
base64_image = base64.b64encode(file_stream.read()).decode( "utf-8" )
data_uri = f "data: { content_type } ;base64, { base64_image } "
messages = [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : prompt},
{ "type" : "image_url" , "image_url" : { "url" : data_uri}}
]
}]
response = client.chat.completions.create( model = model, messages = messages)
return response.choices[ 0 ].message.content
Advanced Examples
Batch Processing Images
from markitdown import MarkItDown
from openai import OpenAI
import os
client = OpenAI()
md = MarkItDown(
llm_client = client,
llm_model = "gpt-4o" ,
exiftool_path = "/usr/local/bin/exiftool"
)
image_dir = "photos"
for filename in os.listdir(image_dir):
if filename.lower().endswith(( '.jpg' , '.jpeg' , '.png' )):
filepath = os.path.join(image_dir, filename)
result = md.convert(filepath)
# Save markdown
output_path = filepath.replace(os.path.splitext(filepath)[ 1 ], '.md' )
with open (output_path, 'w' ) as f:
f.write(result.markdown)
from markitdown import MarkItDown
# No LLM client = metadata only
md = MarkItDown( exiftool_path = "/usr/local/bin/exiftool" )
result = md.convert( "photo.jpg" )
# Parse metadata from markdown
for line in result.markdown.split( ' \n ' ):
if ':' in line:
key, value = line.split( ':' , 1 )
print ( f " { key.strip() } : { value.strip() } " )
from markitdown.converters import exiftool_metadata
with open ( 'photo.jpg' , 'rb' ) as f:
metadata = exiftool_metadata(
f,
exiftool_path = "/usr/local/bin/exiftool"
)
# Direct access to metadata dict
print ( f "Camera: { metadata.get( 'Make' ) } { metadata.get( 'Model' ) } " )
print ( f "ISO: { metadata.get( 'ISO' ) } " )
print ( f "Aperture: { metadata.get( 'Aperture' ) } " )
print ( f "Shutter Speed: { metadata.get( 'ShutterSpeed' ) } " )
Error Handling
from markitdown import MarkItDown
md = MarkItDown( exiftool_path = "/usr/local/bin/exiftool" )
try :
result = md.convert( "photo.jpg" )
if not result.markdown.strip():
print ( "No metadata or description generated" )
except FileNotFoundError :
print ( "exiftool not found at specified path" )
except RuntimeError as e:
if "CVE-2021-22204" in str (e):
print ( "Please upgrade exiftool to version 12.24 or later" )
else :
raise
except Exception as e:
print ( f "Error processing image: { e } " )
Use Cases
Photo Library Documentation
Generate markdown catalogs of photo collections with metadata and AI-generated descriptions for searchability.
Extract and index metadata from image libraries for better organization and retrieval.
Generate alt text and detailed descriptions for web images using LLM captioning.
Integrate into data processing workflows to extract technical and descriptive information from images.
Extract EXIF data including GPS coordinates and timestamps for forensic or legal purposes.
Limitations
No OCR : Text within images is not extracted (consider using Document Intelligence for OCR)
LLM Accuracy : AI-generated descriptions may contain hallucinations or inaccuracies
Format Support : Only JPEG and PNG; no support for TIFF, GIF, WebP, etc.
EXIF Dependency : Metadata extraction requires external exiftool binary
Next Steps
Document Intelligence Use Azure Document Intelligence for OCR and document analysis
PowerPoint Images Images in PPTX files also support LLM captioning