Overview
LLM Magic supports multimodal conversations, allowing you to chat with both text and images. This is powered by theStep message class and its content types like Step\Image and Step\Text.
Basic Image Chat
Loading Images
TheStep\Image class provides multiple static methods for loading images from different sources:
From URL
From File Path
From Raw Contents
From Laravel Storage Disk
From Base64
All image loading methods automatically handle base64 encoding internally. The image is sent to the LLM as base64 data.
Multi-Turn Conversations
You can build complex conversations mixing text and images:You can mix text-only and multimodal messages in the same conversation. The
Step::user() and Step::assistant() methods accept either a string or an array of content parts.Step Message Structure
From src/Magic/Chat/Messages/Step.php, theStep class represents a structured message with:
- role - User, Assistant, or System
- content - Array of content parts (Text, Image, etc.)
Text-Only Messages
For convenience, you can pass a string directly:Multimodal Messages
Pass an array of content parts:Content Types
From src/Magic/Chat/Messages/Step/, the following content types are available:Step\Text
Represents text content:Step\Image
Represents image content with these properties from src/Magic/Chat/Messages/Step/Image.php:- imageBase64 - Base64-encoded image data
- mime - MIME type (e.g., ‘image/jpeg’, ‘image/png’)
Combining with Tools
You can use images in conversations that also use tools:Supported Models
Not all LLM models support vision. Multimodal-capable models include:- Google Gemini 2.0 Flash
- Anthropic Claude 3.5 Sonnet
- OpenAI GPT-4 Vision
- OpenAI GPT-4o
Streaming with Images
You can stream responses to multimodal conversations:Best Practices
Image Size and Format
Image Size and Format
- Use JPEG or PNG formats for best compatibility
- Optimize large images before sending
- Most models have file size limits (typically 20MB)
Context Management
Context Management
- Images consume significant context tokens
- Limit the number of images in a single conversation
- Consider the model’s context window
Error Handling
Error Handling
- Always validate image paths/URLs exist
- Handle MIME type detection failures
- Check model capabilities before sending
Next Steps
Custom Tools
Build custom tools to extend chat capabilities
Document Extraction
Extract data from PDF and image documents