Skip to main content

Overview

Multimodal chat enables AI models to understand and process multiple types of content beyond text, including images, documents, PDFs, and other files. This allows for visual question answering, document analysis, and rich interactive conversations.

Supported Content Types

Vision-enabled models can analyze images:
  • Formats: PNG, JPG, JPEG, GIF, WebP
  • Use cases:
    • Describe images
    • Extract text (OCR)
    • Answer questions about visual content
    • Analyze charts and diagrams
    • Compare multiple images

Vision-Enabled Models

Not all models support multimodal input. Vision-capable models include:
  • GPT-4o
  • GPT-4o-mini
  • GPT-4 Turbo with Vision
  • GPT-4V
  • Claude Sonnet 4
  • Claude Opus 4
  • Claude 3.7 Sonnet
  • Claude 3.5 Sonnet
  • Claude 3 Opus
  • Gemini 3.1 Pro
  • Gemini 2.5 Pro
  • Gemini 2.5 Flash
  • Gemini 2.0 Flash
  • All Gemini models support vision

Uploading Files

1

Click Attachment Button

Look for the paperclip or attachment icon in the message input area.
2

Select Files

Choose one or more files from your device.
Most models support multiple file uploads in a single message.
3

Wait for Upload

Files are uploaded and processed before sending. You’ll see:
  • Upload progress indicator
  • Thumbnail previews for images
  • File name and size for documents
4

Add Context (Optional)

Type a message to provide context or ask specific questions about the uploaded files.
5

Send Message

Click send to submit your message with the attached files.

Example Use Cases

[Upload: screenshot.png]

What's wrong with this error message? How can I fix it?
The AI analyzes the screenshot and provides troubleshooting steps.

File Configuration

Configure file upload limits and restrictions:
# librechat.yaml
fileConfig:
  # Global server file size limit (MB)
  serverFileSizeLimit: 100
  
  # Endpoint-specific settings
  endpoints:
    openAI:
      fileLimit: 10  # Max number of files
      fileSizeLimit: 20  # MB per file
      totalSizeLimit: 100  # Total MB per request
      supportedMimeTypes:
        - "image/.*"
        - "application/pdf"
    
    assistants:
      fileLimit: 5
      fileSizeLimit: 10
      totalSizeLimit: 50
      supportedMimeTypes:
        - "image/.*"
        - "application/pdf"
    
    # Disable file uploads for specific endpoint
    anthropic:
      disabled: false
    
    default:
      totalSizeLimit: 20

Client-Side Image Resizing

Automatically resize large images before upload:
# librechat.yaml
fileConfig:
  clientImageResize:
    enabled: true
    maxWidth: 1900   # pixels
    maxHeight: 1900  # pixels
    quality: 0.92    # JPEG quality (0.0-1.0)
Enable client-side resizing to:
  • Reduce upload times
  • Save bandwidth
  • Prevent upload errors from oversized images
  • Stay within API size limits

Rate Limiting

Control file upload frequency:
# librechat.yaml
rateLimits:
  fileUploads:
    ipMax: 100                # Max uploads per IP
    ipWindowInMinutes: 60    # Time window for IP limit
    userMax: 50               # Max uploads per user
    userWindowInMinutes: 60  # Time window for user limit

Image Vision in Agents

Enable vision capabilities for agents:
# librechat.yaml
endpoints:
  agents:
    capabilities:
      - image_vision
In the agent builder, the Image Vision toggle allows the agent to process uploaded images.

File Storage

Configure where uploaded files are stored:
# librechat.yaml
fileStrategy: "local"  # Default
Files stored on the server filesystem.

Best Practices

  • High-quality images: Better quality input produces better analysis
  • Specific questions: Ask clear questions about visual content
  • Multiple perspectives: Upload multiple images for comparison
  • Text extraction: Works best with clear, well-lit text
  • File size: Optimize large files before upload
  • Context: Provide context about what you want to know

Limitations

  • Model-dependent: Not all models support all file types
  • Size limits: Files must be under configured size limits
  • Processing time: Large files take longer to process
  • Quality matters: Low-quality images may produce poor results
  • API costs: Vision requests typically cost more tokens

Troubleshooting

  • Check file size against limits
  • Verify file type is supported
  • Ensure sufficient storage space
  • Check network connectivity
  • Verify model supports vision (GPT-4o, Claude Sonnet, Gemini)
  • Check image format is supported
  • Try re-uploading the image
  • Ensure image isn’t corrupted
  • Use higher quality images
  • Ensure images are well-lit and clear
  • Crop to relevant areas
  • Try different prompting
  • Check storage configuration in librechat.yaml
  • Verify S3/Firebase credentials if using cloud storage
  • Ensure server has disk space for local storage
  • Check file permissions

Build docs developers (and LLMs) love