Multimodal Support

fast-agent provides comprehensive multimodal support, allowing agents to process images, PDFs, videos, and other media types. Resources can be added to prompts through the built-in prompt-server or using MCP types directly.

Quick Start

Using with_resource()

The simplest way to add resources to a prompt:

import asyncio
from fast_agent import FastAgent

fast = FastAgent("Multimodal Example")

@fast.agent(instruction="You are a helpful assistant")
async def main():
    async with fast.run() as agent:
        summary = await agent.with_resource(
            "Summarise this PDF please",
            "mcp_server",
            "resource://fast-agent/sample.pdf",
        )
        print(summary)

if __name__ == "__main__":
    asyncio.run(main())

Using Prompt Helper

For more control over message structure:

from pathlib import Path
from fast_agent.core.prompt import Prompt

@fast.agent(instruction="You are a helpful AI Agent", servers=["filesystem"])
async def main():
    async with fast.run() as agent:
        await agent.default.generate([
            Prompt.user(
                Path("cat.png"),
                "Write a report on the content of the image to 'report.md'"
            )
        ])

Supported Media Types

Images

Images can be processed by vision-enabled models:

from pathlib import Path
from fast_agent.core.prompt import Prompt

# Local image file
await agent.send(
    Prompt.user(Path("image.png"), "Describe this image")
)

# Remote image URL
from fast_agent import image_link
message = PromptMessageExtended(
    role="user",
    content=[
        text_content("What's in this image?"),
        image_link("https://example.com/image.jpg")
    ]
)

Supported formats: PNG, JPEG, GIF, WebP

Videos

Process video content with compatible models:

from fast_agent import FastAgent, text_content, video_link
from fast_agent.types import PromptMessageExtended

fast = FastAgent("Video Resource Test")

@fast.agent()
async def main():
    async with fast.run() as agent:
        message = PromptMessageExtended(
            role="user",
            content=[
                text_content("What happens in this video?"),
                video_link(
                    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
                    name="Mystery Video"
                ),
            ],
        )
        await agent.default.generate([message])

Google’s models support video processing, including YouTube URLs and direct video files.

PDFs

PDFs can be processed by supported models:

# Using with_resource for PDF processing
response = await agent.with_resource(
    "Summarize the key points from this document",
    "prompt-server",
    "file://document.pdf"
)

Anthropic: Supports text and images from PDFs
Google: Native PDF support with full content extraction
OpenAI: PDF content converted to text

MCP Resource Conversion

LLM APIs have different restrictions on content types that can be returned as Tool Call results:

OpenAI

Text

Anthropic

Text
Images

Google

Text
Images
PDFs
Video (up to 20MB inline)

For MCP Tool Results, ImageResources and EmbeddedResources are automatically converted to User Messages and added to the conversation.

Advanced Usage

Multi-File Processing

Process multiple files in a single prompt:

from pathlib import Path
from fast_agent.core.prompt import Prompt

@fast.parallel(
    fan_out=["proofreader", "fact_checker", "style_enforcer"],
    fan_in="grader",
    name="parallel",
)
async def main():
    async with fast.run() as agent:
        await agent.parallel.send(
            Prompt.user("Student short story submission", Path("short_story.txt"))
        )

Interactive Vision Demo

Combine multiple MCP servers for complex workflows:

@fast.agent(
    instruction="You are a helpful AI Agent",
    servers=["webcam", "hfspace"]
)
async def main():
    async with fast.run() as agent:
        await agent.interactive(
            default_prompt="take an image with the webcam, describe it to flux to "
            "reproduce it and then judge the quality of the result"
        )

Resource URIs

Resources can be referenced using various URI schemes:

# File system
"file://path/to/file.pdf"

# Resource protocol (MCP server)
"resource://server-name/resource-id"

# HTTP/HTTPS URLs
"https://example.com/image.jpg"

# YouTube videos
"https://www.youtube.com/watch?v=VIDEO_ID"

Best Practices

Choose the Right Model

Not all models support all media types. Use vision-capable models (e.g., GPT-4 Vision, Claude 3, Gemini) for image/video tasks.

Optimize File Sizes

Images: Keep under 20MB for best performance
Videos: Use YouTube URLs for large files
PDFs: Consider extracting text for very large documents

Use Appropriate Servers

Ensure your agent has access to necessary MCP servers:

@fast.agent(
    servers=["filesystem", "webcam", "prompt-server"]
)

Troubleshooting

Model doesn't support media type

Check provider-specific documentation:

OpenAI: Images only in vision models
Anthropic: Images and PDFs in Claude 3+
Google: Full multimodal support in Gemini models

File not found errors

Ensure:

File paths are correct and accessible
MCP server is properly configured
Filesystem server has appropriate permissions

Large file handling

For files exceeding size limits:

Use external hosting (S3, CDN) with URLs
Split large documents into chunks
Use Google’s File API for videos >20MB

Prompts

Learn about MCP Prompts and prompt templates

MCP Servers

Configure servers for resource access

Get Started

Core Concepts

Agent Types

Workflows

Features

CLI Reference

Configuration

Examples

Quick Start

Using with_resource()

Using Prompt Helper

Supported Media Types

MCP Resource Conversion

OpenAI

Anthropic

Google

Advanced Usage

Multi-File Processing

Interactive Vision Demo

Resource URIs

Best Practices

Troubleshooting

Prompts

MCP Servers

Build docs developers (and LLMs) love

Get Started

Core Concepts

Agent Types

Workflows

Features

CLI Reference

Configuration

Examples

​Quick Start

​Using with_resource()

​Using Prompt Helper

​Supported Media Types

​MCP Resource Conversion

OpenAI

Anthropic

Google

​Advanced Usage

​Multi-File Processing

​Interactive Vision Demo

​Resource URIs

​Best Practices

​Troubleshooting

​Related Topics

Prompts

MCP Servers

Build docs developers (and LLMs) love

Quick Start

Using with_resource()

Using Prompt Helper

Supported Media Types

MCP Resource Conversion

Advanced Usage

Multi-File Processing

Interactive Vision Demo

Resource URIs

Best Practices

Troubleshooting

Related Topics