BAML provides first-class support for multimodal types, allowing you to work with images, audio, video, and PDF files in your AI applications.
Image files (PNG, JPEG, GIF, WebP, etc.)
Audio files (MP3, WAV, OGG, etc.)
Video files (MP4, WebM, etc.)
All media types can be provided in three ways:
- URL - A web URL pointing to the media
- Base64 - Base64-encoded string with media type
- File Path - Local file path
Security Consideration:Constructing media from untrusted URLs may expose you to SSRF attacks. BAML may download and transcode files depending on the model.Only use URLs from trusted sources or validate them using allowlists/denylists.
Image Type
BAML Definition
function DescribeImage(image: image) -> string {
client "openai/gpt-4o-mini"
prompt #"
Describe the image in four words:
{{ image }}
"#
}
Usage in Tests
test ImageDescriptionFromURL {
functions [DescribeImage]
args {
image {
url "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
}
}
}
test ImageDescriptionFromBase64 {
functions [DescribeImage]
args {
image {
media_type "image/png"
base64 "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+ip1sAAAAASUVORK5CYII="
}
}
}
test ImageDescriptionFromFile {
functions [DescribeImage]
args {
image {
file "./shrek.png"
}
}
}
Client Code Usage
from baml_py import Image
from baml_client import b
async def test_image_input():
# From URL
res = await b.DescribeImage(
img=Image.from_url("https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png")
)
# From Base64
image_b64 = "iVBORw0K..."
res = await b.DescribeImage(
img=Image.from_base64("image/png", image_b64)
)
Audio Type
BAML Definition
function DescribeSound(myAudio: audio) -> string {
client "openai/gpt-4o-mini"
prompt #"
Describe the audio in four words:
{{ myAudio }}
"#
}
Client Code Usage
from baml_py import Audio
from baml_client import b
async def run():
# From URL
res = await b.DescribeSound(
audio=Audio.from_url(
"https://actions.google.com/sounds/v1/emergency/beeper_emergency_call.ogg"
)
)
# From Base64
audio_b64 = "iVBORw0K..."
res = await b.DescribeSound(
audio=Audio.from_base64("audio/ogg", audio_b64)
)
PDF Type
BAML Definition
function AnalyzePdf(myPdf: pdf) -> string {
client "openai/gpt-4o-mini"
prompt #"
Summarize the main points of this PDF:
{{ myPdf }}
"#
}
PDF inputs must be provided as Base64 data using Pdf.from_base64. URL-based inputs are not currently supported.
Client Code Usage
from baml_py import Pdf
from baml_client import b
async def run():
# Base64 only
pdf_b64 = "JVBERi0K..."
res = await b.AnalyzePdf(
pdf=Pdf.from_base64("application/pdf", pdf_b64)
)
Video Type
BAML Definition
function DescribeVideo(myVideo: video) -> string {
client "openai/gpt-4o-mini"
prompt #"
Describe what happens in this video:
{{ myVideo }}
"#
}
When providing a video via URL, the URL is passed directly to the model. Some models cannot download external media; in that case, convert the video to Base64 first.
Client Code Usage
from baml_py import Video
from baml_client import b
async def run():
# From URL
res = await b.DescribeVideo(
video=Video.from_url("https://example.com/sample.mp4")
)
# From Base64
video_b64 = "AAAAGGZ0eXBpc29t..."
res = await b.DescribeVideo(
video=Video.from_base64("video/mp4", video_b64)
)
Controlling URL Processing
You can control how BAML processes media URLs before sending them to providers:
client<llm> MyClient {
provider anthropic
options {
media_url_handler {
image "send_base64" // Convert image URLs to base64
pdf "send_url" // Keep PDF URLs as-is
audio "send_base64" // Convert audio URLs to base64
video "send_url" // Keep video URLs as-is
}
}
}
Configuration for how each media type’s URLs are processed"send_url" or "send_base64" - How to handle image URLs
"send_url" or "send_base64" - How to handle audio URLs
"send_url" or "send_base64" - How to handle video URLs
"send_url" or "send_base64" - How to handle PDF URLs
This allows you to override default behavior for each provider and media type combination.
Pydantic Compatibility
For Python users using Pydantic, media types can be constructed from JSON:
// URL format
{
"url": "https://example.com/image.png"
}
// URL with media type
{
"url": "https://example.com/image.png",
"media_type": "image/png"
}
// Base64 format
{
"base64": "iVBORw0K...."
}
// Base64 with media type
{
"base64": "iVBORw0K....",
"media_type": "image/png"
}
Different LLM providers support different media types:
| Provider | Image | Audio | Video | PDF |
|---|
| OpenAI | ✓ | ✓ | Limited | Via vision |
| Anthropic | ✓ | Limited | Limited | ✓ |
| Google AI | ✓ | ✓ | ✓ | Via vision |
| AWS Bedrock | ✓ | Model-dependent | Model-dependent | Via vision |
Check your specific model’s documentation for exact multimodal capabilities.
Best Practices
- Use Base64 for sensitive data - Avoid exposing internal URLs
- Validate URLs - Implement allowlists/denylists for URL inputs
- Compress large media - Reduce file sizes before encoding to Base64
- Check provider support - Verify your LLM provider supports the media type
- Handle errors gracefully - Media processing can fail for various reasons
- Use appropriate media types - Specify correct MIME types for better parsing
Images
image/png
image/jpeg
image/gif
image/webp
Audio
audio/mp3
audio/mpeg
audio/wav
audio/ogg
Video
video/mp4
video/webm
video/quicktime
PDF