Media Types

BAML provides first-class support for multimodal types, allowing you to work with images, audio, video, and PDF files in your AI applications.

Supported Media Types

image

media

Image files (PNG, JPEG, GIF, WebP, etc.)

audio

media

Audio files (MP3, WAV, OGG, etc.)

video

media

Video files (MP4, WebM, etc.)

pdf

media

PDF documents

Input Formats

All media types can be provided in three ways:

URL - A web URL pointing to the media
Base64 - Base64-encoded string with media type
File Path - Local file path

Security Consideration:Constructing media from untrusted URLs may expose you to SSRF attacks. BAML may download and transcode files depending on the model.Only use URLs from trusted sources or validate them using allowlists/denylists.

Image Type

BAML Definition

function DescribeImage(image: image) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Describe the image in four words:
        {{ image }}
    "#
}

Usage in Tests

test ImageDescriptionFromURL {
    functions [DescribeImage]
    args {
        image {
            url "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
        }
    }
}

test ImageDescriptionFromBase64 {
    functions [DescribeImage]
    args { 
        image {
            media_type "image/png"
            base64 "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+ip1sAAAAASUVORK5CYII="
        }
    }
}

test ImageDescriptionFromFile {
    functions [DescribeImage]
    args {
        image {
            file "./shrek.png"
        }
    }
}

Client Code Usage

from baml_py import Image
from baml_client import b

async def test_image_input():
    # From URL
    res = await b.DescribeImage(
        img=Image.from_url("https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png")
    )

    # From Base64
    image_b64 = "iVBORw0K..."
    res = await b.DescribeImage(
        img=Image.from_base64("image/png", image_b64)
    )

Audio Type

BAML Definition

function DescribeSound(myAudio: audio) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Describe the audio in four words:
        {{ myAudio }}
    "#
}

Client Code Usage

from baml_py import Audio
from baml_client import b

async def run():
    # From URL
    res = await b.DescribeSound(
        audio=Audio.from_url(
            "https://actions.google.com/sounds/v1/emergency/beeper_emergency_call.ogg"
        )
    )

    # From Base64
    audio_b64 = "iVBORw0K..."
    res = await b.DescribeSound(
        audio=Audio.from_base64("audio/ogg", audio_b64)
    )

PDF Type

BAML Definition

function AnalyzePdf(myPdf: pdf) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Summarize the main points of this PDF:
        {{ myPdf }}
    "#
}

PDF inputs must be provided as Base64 data using Pdf.from_base64. URL-based inputs are not currently supported.

Client Code Usage

from baml_py import Pdf
from baml_client import b

async def run():
    # Base64 only
    pdf_b64 = "JVBERi0K..."
    res = await b.AnalyzePdf(
        pdf=Pdf.from_base64("application/pdf", pdf_b64)
    )

Video Type

BAML Definition

function DescribeVideo(myVideo: video) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Describe what happens in this video:
        {{ myVideo }}
    "#
}

When providing a video via URL, the URL is passed directly to the model. Some models cannot download external media; in that case, convert the video to Base64 first.

Client Code Usage

from baml_py import Video
from baml_client import b

async def run():
    # From URL
    res = await b.DescribeVideo(
        video=Video.from_url("https://example.com/sample.mp4")
    )

    # From Base64
    video_b64 = "AAAAGGZ0eXBpc29t..."
    res = await b.DescribeVideo(
        video=Video.from_base64("video/mp4", video_b64)
    )

Controlling URL Processing

You can control how BAML processes media URLs before sending them to providers:

client<llm> MyClient {
    provider anthropic
    options {
        media_url_handler {
            image "send_base64"     // Convert image URLs to base64
            pdf "send_url"          // Keep PDF URLs as-is
            audio "send_base64"     // Convert audio URLs to base64
            video "send_url"        // Keep video URLs as-is
        }
    }
}

media_url_handler

object

Configuration for how each media type’s URLs are processed

image

string

"send_url" or "send_base64" - How to handle image URLs

audio

string

"send_url" or "send_base64" - How to handle audio URLs

video

string

"send_url" or "send_base64" - How to handle video URLs

pdf

string

"send_url" or "send_base64" - How to handle PDF URLs

This allows you to override default behavior for each provider and media type combination.

Pydantic Compatibility

For Python users using Pydantic, media types can be constructed from JSON:

// URL format
{
  "url": "https://example.com/image.png"
}

// URL with media type
{
  "url": "https://example.com/image.png",
  "media_type": "image/png"
}

// Base64 format
{
  "base64": "iVBORw0K...."
}

// Base64 with media type
{
  "base64": "iVBORw0K....",
  "media_type": "image/png"
}

Media Type Support by Provider

Different LLM providers support different media types:

Provider	Image	Audio	Video	PDF
OpenAI	✓	✓	Limited	Via vision
Anthropic	✓	Limited	Limited	✓
Google AI	✓	✓	✓	Via vision
AWS Bedrock	✓	Model-dependent	Model-dependent	Via vision

Check your specific model’s documentation for exact multimodal capabilities.

Best Practices

Use Base64 for sensitive data - Avoid exposing internal URLs
Validate URLs - Implement allowlists/denylists for URL inputs
Compress large media - Reduce file sizes before encoding to Base64
Check provider support - Verify your LLM provider supports the media type
Handle errors gracefully - Media processing can fail for various reasons
Use appropriate media types - Specify correct MIME types for better parsing

Common Media Types

Images

image/png
image/jpeg
image/gif
image/webp

Audio

audio/mp3
audio/mpeg
audio/wav
audio/ogg

Video

video/mp4
video/webm
video/quicktime

PDF

application/pdf

BAML Language

Type System

CLI

Client API

LLM Providers

Media Types

Supported Media Types

Input Formats

Image Type

BAML Definition

Usage in Tests

Client Code Usage

Audio Type

BAML Definition

Client Code Usage

PDF Type

BAML Definition

Client Code Usage

Video Type

BAML Definition

Client Code Usage

Controlling URL Processing

Pydantic Compatibility

Media Type Support by Provider

Best Practices

Common Media Types

Images

Audio

Video

PDF

Build docs developers (and LLMs) love

BAML Language

Type System

CLI

Client API

LLM Providers

​Supported Media Types

​Input Formats

​Image Type

​BAML Definition

​Usage in Tests

​Client Code Usage

​Audio Type

​BAML Definition

​Client Code Usage

​PDF Type

​BAML Definition

​Client Code Usage

​Video Type

​BAML Definition

​Client Code Usage

​Controlling URL Processing

​Pydantic Compatibility

​Media Type Support by Provider

​Best Practices

​Common Media Types

​Images

​Audio

​Video

​PDF

Build docs developers (and LLMs) love

Supported Media Types

Input Formats

Image Type

BAML Definition

Usage in Tests

Client Code Usage

Audio Type

BAML Definition

Client Code Usage

PDF Type

BAML Definition

Client Code Usage

Video Type

BAML Definition

Client Code Usage

Controlling URL Processing

Pydantic Compatibility

Media Type Support by Provider

Best Practices

Common Media Types

Images

Audio

Video

PDF