Skip to main content
fast-agent provides comprehensive multimodal support, allowing agents to process images, PDFs, videos, and other media types. Resources can be added to prompts through the built-in prompt-server or using MCP types directly.

Quick Start

Using with_resource()

The simplest way to add resources to a prompt:
import asyncio
from fast_agent import FastAgent

fast = FastAgent("Multimodal Example")

@fast.agent(instruction="You are a helpful assistant")
async def main():
    async with fast.run() as agent:
        summary = await agent.with_resource(
            "Summarise this PDF please",
            "mcp_server",
            "resource://fast-agent/sample.pdf",
        )
        print(summary)

if __name__ == "__main__":
    asyncio.run(main())

Using Prompt Helper

For more control over message structure:
from pathlib import Path
from fast_agent.core.prompt import Prompt

@fast.agent(instruction="You are a helpful AI Agent", servers=["filesystem"])
async def main():
    async with fast.run() as agent:
        await agent.default.generate([
            Prompt.user(
                Path("cat.png"),
                "Write a report on the content of the image to 'report.md'"
            )
        ])

Supported Media Types

Images can be processed by vision-enabled models:
from pathlib import Path
from fast_agent.core.prompt import Prompt

# Local image file
await agent.send(
    Prompt.user(Path("image.png"), "Describe this image")
)

# Remote image URL
from fast_agent import image_link
message = PromptMessageExtended(
    role="user",
    content=[
        text_content("What's in this image?"),
        image_link("https://example.com/image.jpg")
    ]
)
Supported formats: PNG, JPEG, GIF, WebP
Process video content with compatible models:
from fast_agent import FastAgent, text_content, video_link
from fast_agent.types import PromptMessageExtended

fast = FastAgent("Video Resource Test")

@fast.agent()
async def main():
    async with fast.run() as agent:
        message = PromptMessageExtended(
            role="user",
            content=[
                text_content("What happens in this video?"),
                video_link(
                    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
                    name="Mystery Video"
                ),
            ],
        )
        await agent.default.generate([message])
Google’s models support video processing, including YouTube URLs and direct video files.
PDFs can be processed by supported models:
# Using with_resource for PDF processing
response = await agent.with_resource(
    "Summarize the key points from this document",
    "prompt-server",
    "file://document.pdf"
)
  • Anthropic: Supports text and images from PDFs
  • Google: Native PDF support with full content extraction
  • OpenAI: PDF content converted to text

MCP Resource Conversion

LLM APIs have different restrictions on content types that can be returned as Tool Call results:

OpenAI

  • Text

Anthropic

  • Text
  • Images

Google

  • Text
  • Images
  • PDFs
  • Video (up to 20MB inline)
For MCP Tool Results, ImageResources and EmbeddedResources are automatically converted to User Messages and added to the conversation.

Advanced Usage

Multi-File Processing

Process multiple files in a single prompt:
from pathlib import Path
from fast_agent.core.prompt import Prompt

@fast.parallel(
    fan_out=["proofreader", "fact_checker", "style_enforcer"],
    fan_in="grader",
    name="parallel",
)
async def main():
    async with fast.run() as agent:
        await agent.parallel.send(
            Prompt.user("Student short story submission", Path("short_story.txt"))
        )

Interactive Vision Demo

Combine multiple MCP servers for complex workflows:
@fast.agent(
    instruction="You are a helpful AI Agent",
    servers=["webcam", "hfspace"]
)
async def main():
    async with fast.run() as agent:
        await agent.interactive(
            default_prompt="take an image with the webcam, describe it to flux to "
            "reproduce it and then judge the quality of the result"
        )

Resource URIs

Resources can be referenced using various URI schemes:
# File system
"file://path/to/file.pdf"

# Resource protocol (MCP server)
"resource://server-name/resource-id"

# HTTP/HTTPS URLs
"https://example.com/image.jpg"

# YouTube videos
"https://www.youtube.com/watch?v=VIDEO_ID"

Best Practices

1

Choose the Right Model

Not all models support all media types. Use vision-capable models (e.g., GPT-4 Vision, Claude 3, Gemini) for image/video tasks.
2

Optimize File Sizes

  • Images: Keep under 20MB for best performance
  • Videos: Use YouTube URLs for large files
  • PDFs: Consider extracting text for very large documents
3

Use Appropriate Servers

Ensure your agent has access to necessary MCP servers:
@fast.agent(
    servers=["filesystem", "webcam", "prompt-server"]
)

Troubleshooting

Check provider-specific documentation:
  • OpenAI: Images only in vision models
  • Anthropic: Images and PDFs in Claude 3+
  • Google: Full multimodal support in Gemini models
Ensure:
  • File paths are correct and accessible
  • MCP server is properly configured
  • Filesystem server has appropriate permissions
For files exceeding size limits:
  • Use external hosting (S3, CDN) with URLs
  • Split large documents into chunks
  • Use Google’s File API for videos >20MB

Prompts

Learn about MCP Prompts and prompt templates

MCP Servers

Configure servers for resource access

Build docs developers (and LLMs) love