Skip to main content
LibreChat’s multimodal capabilities enable AI models to understand and analyze images, documents, and other file types alongside text conversations.

Overview

Multimodal AI can:

Analyze Images

Understand photos, screenshots, diagrams, and charts

Process Documents

Read and analyze PDFs, text files, and documents

Extract Information

Pull data from tables, forms, and structured content

Visual Understanding

Describe scenes, identify objects, read text in images

Vision-Capable Models

These models support image understanding:

OpenAI

gpt-4o and gpt-4o-mini
  • Native vision support
  • High-quality image analysis
  • Document understanding
  • Fast processing

Anthropic

claude-4.5-sonnet
  • Advanced vision understanding
  • Long context with images
  • Document analysis
  • Chart and diagram interpretation

Google

gemini-2.0-flash-exp
  • Native multimodal processing
  • Images, video, audio support
  • Large context window (1M tokens)

AWS Bedrock

Claude models through Bedrock:
  • anthropic.claude-3-7-sonnet
  • anthropic.claude-3-5-sonnet
  • Vision support via Anthropic models

Uploading Files

Supported File Types

Image Formats:
  • JPEG/JPG
  • PNG
  • GIF
  • WebP
  • BMP
  • SVG (converted to raster)
librechat.yaml
fileConfig:
  endpoints:
    openAI:
      supportedMimeTypes:
        - "image/.*"

File Upload Configuration

librechat.yaml
fileConfig:
  endpoints:
    openAI:
      fileLimit: 10  # Max files per request
      fileSizeLimit: 20  # MB per file
      totalSizeLimit: 100  # MB total per request
      supportedMimeTypes:
        - "image/.*"
        - "application/pdf"
    
    anthropic:
      fileLimit: 5
      fileSizeLimit: 10
      totalSizeLimit: 50
      supportedMimeTypes:
        - "image/.*"
        - "application/pdf"
  
  # Global server limit
  serverFileSizeLimit: 100  # MB
  
  # Avatar images
  avatarSizeLimit: 2  # MB

Client-Side Image Resizing

Prevent upload errors by resizing large images:
librechat.yaml
fileConfig:
  clientImageResize:
    enabled: true
    maxWidth: 1900
    maxHeight: 1900
    quality: 0.92  # JPEG quality (0.0-1.0)

Using Multimodal Features

Image Analysis

Upload images and ask questions:
User: [Uploads screenshot of error message]
What's causing this error and how can I fix it?

Assistant: Looking at your screenshot, the error is a 
"TypeError: Cannot read property 'map' of undefined". This 
occurs on line 42 where you're trying to map over `data.items`.

The issue is that `data.items` is undefined, likely because:
1. The API response doesn't include an `items` property
2. The data hasn't loaded yet when the component renders

Fix:
```javascript
{data?.items?.map(item => ...)}
// or
{(data.items || []).map(item => ...)}

### Document Analysis

Analyze PDFs and documents:

User: [Uploads contract.pdf] Summarize the key terms of this contract. Assistant: Based on the contract, here are the key terms:
  1. Parties: Agreement between Company A and Company B
  2. Duration: 2-year term starting January 1, 2024
  3. Payment: $50,000 annual fee, paid quarterly
  4. Termination: 60-day notice required
  5. Confidentiality: 5-year non-disclosure period
  6. Intellectual Property: Company A retains all IP rights
[Detailed analysis continues…]

### Chart and Graph Reading

Extract data from visualizations:

User: [Uploads sales chart] What trends do you see in this sales data? Assistant: Analyzing the chart, I can see: Trends:
  • Overall upward trend from Q1 to Q4
  • Strongest growth in Q3 (+35%)
  • Slight dip in Q2 (-5%)
  • Revenue peaked in December at $450K
Key Observations:
  • Summer months (June-August) show consistent growth
  • Holiday season (Nov-Dec) accounts for 30% of annual revenue
  • Average monthly growth rate: 8.5%

### OCR and Text Extraction

Read text from images:

User: [Uploads photo of handwritten notes] Transcribe these notes. Assistant: Here’s the transcription: Meeting Notes - January 15, 2024
  1. Project deadline moved to March 1st
  2. Budget increased by 15%
  3. New team members:
    • Sarah (Designer)
    • Mike (Backend Developer)
  4. Next milestone: Prototype review Feb 5th
Action Items:
  • Update project timeline
  • Schedule design review
  • Prepare demo for stakeholders

## Agent Integration

Use multimodal capabilities with agents:

```typescript
{
  name: 'Visual Assistant',
  provider: 'openAI',
  model: 'gpt-4o',
  instructions: `
    You are a visual analysis assistant. When users upload images:
    1. Provide detailed descriptions
    2. Extract text and data
    3. Answer questions about the image
    4. Identify objects, people, and scenes
  `,
  tools: ['execute_code'],  // Can use code to process images
  tool_resources: {
    execute_code: {
      file_ids: []  // Images are available to code interpreter
    }
  }
}

Image Processing with Code

Combine vision models with code execution:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

# Load the uploaded image
img = Image.open('photo.jpg')
img_array = np.array(img)

# Analyze dimensions
height, width, channels = img_array.shape
print(f"Image size: {width}x{height}, Channels: {channels}")

# Calculate average color
avg_color = img_array.mean(axis=(0, 1))
print(f"Average RGB: {avg_color}")

# Create histogram
plt.figure(figsize=(10, 4))
for i, color in enumerate(['red', 'green', 'blue']):
    plt.subplot(1, 3, i+1)
    plt.hist(img_array[:,:,i].ravel(), bins=256, color=color, alpha=0.7)
    plt.title(f'{color.capitalize()} Channel')
plt.tight_layout()
plt.savefig('color_histogram.png')

File Context and Management

LibreChat tracks files with different contexts:

File Contexts

{
  context: FileContext.message_attachment,
  file_id: 'file-123',
  messageId: 'msg-456',
  conversationId: 'conv-789'
}
Files attached to specific messages

File Storage Strategies

Configure where files are stored:
librechat.yaml
# Single strategy for all files
fileStrategy: "s3"

# Or granular control
fileStrategy:
  avatar: "s3"      # User/agent avatars
  image: "firebase" # Chat images
  document: "local" # Documents
Available strategies:
  • local: Local file system
  • s3: AWS S3 or compatible
  • firebase: Firebase Storage

Provider-Specific Features

OpenAI Vision

OpenAI models with vision support:
{
  endpoint: 'openAI',
  model: 'gpt-4o',
  // Images are automatically processed
  // No special configuration needed
}
Features:
  • High-quality image understanding
  • Multiple images per message
  • Image + text combinations
  • Fast processing

Anthropic Claude

Claude with document understanding:
{
  endpoint: 'anthropic',
  model: 'claude-4.5-sonnet',
  // Supports PDFs natively with Responses API
}
Features:
  • PDF document analysis
  • Long documents (up to 200K tokens)
  • Detailed visual understanding
  • Citation-based responses

Google Gemini

Gemini’s native multimodal processing:
{
  endpoint: 'google',
  model: 'gemini-2.0-flash-exp',
  // Multimodal by default
}
Features:
  • Unified multimodal model
  • 1M token context window
  • Video and audio support (coming soon)
  • Fast inference

Document Processing

Direct PDF support for select providers:
// Providers with document support
const documentSupportedProviders = new Set([
  'anthropic',
  'openAI',
  'bedrock',
  'google',
  'custom',  // Depends on endpoint
]);

PDF Analysis Example

User: [Uploads research_paper.pdf - 50 pages]
Summarize the methodology section.

Assistant: The methodology section (pages 8-15) describes:

**Study Design:**
- Randomized controlled trial
- 500 participants across 3 groups
- 12-month duration

**Data Collection:**
- Baseline assessments
- Monthly follow-ups
- Final evaluation at 12 months

**Analysis Methods:**
- Mixed-effects models for longitudinal data
- Intent-to-treat analysis
- Multiple imputation for missing data

[Detailed breakdown continues...]

Best Practices

Image Quality: Higher resolution images provide better analysis but take longer to process
Multiple Images: Upload multiple angles or views for comprehensive analysis
Clear Questions: Be specific about what you want to know from the image/document
File Sizes: Large files may hit upload limits or timeout - compress when possible
Privacy: Uploaded files are processed by AI providers - don’t upload sensitive data without proper security

Example Use Cases

{
  name: 'Medical Image Assistant',
  model: 'gpt-4o',
  instructions: `
    Analyze medical images and provide observations.
    Note: For informational purposes only, not medical advice.
    
    When analyzing:
    1. Describe visible structures
    2. Note any abnormalities
    3. Suggest areas for professional review
    4. Always recommend consulting healthcare professionals
  `
}
{
  name: 'Architecture Reviewer',
  model: 'claude-4.5-sonnet',
  instructions: `
    Review architectural diagrams and provide feedback.
    
    For each diagram:
    1. Identify all components and their relationships
    2. Assess scalability and redundancy
    3. Flag potential bottlenecks
    4. Suggest improvements
  `
}
{
  name: 'Receipt Processor',
  model: 'gpt-4o',
  instructions: `
    Extract data from receipt images.
    
    For each receipt, extract:
    - Vendor name
    - Date and time
    - Individual items and prices
    - Subtotal, tax, total
    - Payment method
    
    Format as JSON for easy processing.
  `,
  tools: ['execute_code']  // Can save to CSV
}

Troubleshooting

Check:
  • Model supports vision (gpt-4o, claude-4.5, etc.)
  • File size within limits
  • Correct MIME type (image/*)
  • Image format is supported
Improve:
  • Upload higher resolution images
  • Ensure good lighting and clarity
  • Crop to relevant content
  • Use specific prompts
Solutions:
  • Use document-supported provider
  • Check file size limits
  • Verify PDF is not corrupted
  • Try splitting large PDFs

Code Interpreter

Process images programmatically

Image Generation

Generate images with AI

Agents

Use multimodal with agents

File Search

Search through documents

Build docs developers (and LLMs) love