Skip to main content
BAML provides first-class support for multimodal types, allowing you to work with images, audio, video, and PDF files in your AI applications.

Supported Media Types

image
media
Image files (PNG, JPEG, GIF, WebP, etc.)
audio
media
Audio files (MP3, WAV, OGG, etc.)
video
media
Video files (MP4, WebM, etc.)
pdf
media
PDF documents

Input Formats

All media types can be provided in three ways:
  1. URL - A web URL pointing to the media
  2. Base64 - Base64-encoded string with media type
  3. File Path - Local file path
Security Consideration:Constructing media from untrusted URLs may expose you to SSRF attacks. BAML may download and transcode files depending on the model.Only use URLs from trusted sources or validate them using allowlists/denylists.

Image Type

BAML Definition

function DescribeImage(image: image) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Describe the image in four words:
        {{ image }}
    "#
}

Usage in Tests

test ImageDescriptionFromURL {
    functions [DescribeImage]
    args {
        image {
            url "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
        }
    }
}

test ImageDescriptionFromBase64 {
    functions [DescribeImage]
    args { 
        image {
            media_type "image/png"
            base64 "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+ip1sAAAAASUVORK5CYII="
        }
    }
}

test ImageDescriptionFromFile {
    functions [DescribeImage]
    args {
        image {
            file "./shrek.png"
        }
    }
}

Client Code Usage

from baml_py import Image
from baml_client import b

async def test_image_input():
    # From URL
    res = await b.DescribeImage(
        img=Image.from_url("https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png")
    )

    # From Base64
    image_b64 = "iVBORw0K..."
    res = await b.DescribeImage(
        img=Image.from_base64("image/png", image_b64)
    )

Audio Type

BAML Definition

function DescribeSound(myAudio: audio) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Describe the audio in four words:
        {{ myAudio }}
    "#
}

Client Code Usage

from baml_py import Audio
from baml_client import b

async def run():
    # From URL
    res = await b.DescribeSound(
        audio=Audio.from_url(
            "https://actions.google.com/sounds/v1/emergency/beeper_emergency_call.ogg"
        )
    )

    # From Base64
    audio_b64 = "iVBORw0K..."
    res = await b.DescribeSound(
        audio=Audio.from_base64("audio/ogg", audio_b64)
    )

PDF Type

BAML Definition

function AnalyzePdf(myPdf: pdf) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Summarize the main points of this PDF:
        {{ myPdf }}
    "#
}
PDF inputs must be provided as Base64 data using Pdf.from_base64. URL-based inputs are not currently supported.

Client Code Usage

from baml_py import Pdf
from baml_client import b

async def run():
    # Base64 only
    pdf_b64 = "JVBERi0K..."
    res = await b.AnalyzePdf(
        pdf=Pdf.from_base64("application/pdf", pdf_b64)
    )

Video Type

BAML Definition

function DescribeVideo(myVideo: video) -> string {
    client "openai/gpt-4o-mini"
    prompt #"
        Describe what happens in this video:
        {{ myVideo }}
    "#
}
When providing a video via URL, the URL is passed directly to the model. Some models cannot download external media; in that case, convert the video to Base64 first.

Client Code Usage

from baml_py import Video
from baml_client import b

async def run():
    # From URL
    res = await b.DescribeVideo(
        video=Video.from_url("https://example.com/sample.mp4")
    )

    # From Base64
    video_b64 = "AAAAGGZ0eXBpc29t..."
    res = await b.DescribeVideo(
        video=Video.from_base64("video/mp4", video_b64)
    )

Controlling URL Processing

You can control how BAML processes media URLs before sending them to providers:
client<llm> MyClient {
    provider anthropic
    options {
        media_url_handler {
            image "send_base64"     // Convert image URLs to base64
            pdf "send_url"          // Keep PDF URLs as-is
            audio "send_base64"     // Convert audio URLs to base64
            video "send_url"        // Keep video URLs as-is
        }
    }
}
media_url_handler
object
Configuration for how each media type’s URLs are processed
image
string
"send_url" or "send_base64" - How to handle image URLs
audio
string
"send_url" or "send_base64" - How to handle audio URLs
video
string
"send_url" or "send_base64" - How to handle video URLs
pdf
string
"send_url" or "send_base64" - How to handle PDF URLs
This allows you to override default behavior for each provider and media type combination.

Pydantic Compatibility

For Python users using Pydantic, media types can be constructed from JSON:
// URL format
{
  "url": "https://example.com/image.png"
}

// URL with media type
{
  "url": "https://example.com/image.png",
  "media_type": "image/png"
}

// Base64 format
{
  "base64": "iVBORw0K...."
}

// Base64 with media type
{
  "base64": "iVBORw0K....",
  "media_type": "image/png"
}

Media Type Support by Provider

Different LLM providers support different media types:
ProviderImageAudioVideoPDF
OpenAILimitedVia vision
AnthropicLimitedLimited
Google AIVia vision
AWS BedrockModel-dependentModel-dependentVia vision
Check your specific model’s documentation for exact multimodal capabilities.

Best Practices

  1. Use Base64 for sensitive data - Avoid exposing internal URLs
  2. Validate URLs - Implement allowlists/denylists for URL inputs
  3. Compress large media - Reduce file sizes before encoding to Base64
  4. Check provider support - Verify your LLM provider supports the media type
  5. Handle errors gracefully - Media processing can fail for various reasons
  6. Use appropriate media types - Specify correct MIME types for better parsing

Common Media Types

Images

  • image/png
  • image/jpeg
  • image/gif
  • image/webp

Audio

  • audio/mp3
  • audio/mpeg
  • audio/wav
  • audio/ogg

Video

  • video/mp4
  • video/webm
  • video/quicktime

PDF

  • application/pdf

Build docs developers (and LLMs) love