Images
There are three main ways to provide image input:- Cloud Storage (GCS)
- Local Files (Bytes)
- File API
Use Supported image formats: JPEG, PNG, WebP, GIF
Part.from_uri for images stored in Google Cloud Storage:Audio
Process audio files for transcription, analysis, or understanding:- Local Audio (Bytes)
- Cloud Storage Audio
- File API (Long Audio)
Video
Analyze video content for descriptions, summaries, or specific questions:- Cloud Storage Video
- File API (Recommended)
- Video Frames
PDFs
Extract information from PDF documents:- File API (Gemini Developer API)
- Cloud Storage (Vertex AI)
- Multiple PDFs
Combining Multiple Modalities
You can mix different media types in a single request:MIME Types Reference
Common MIME types for different media:| Media Type | MIME Type Examples |
|---|---|
| Images | image/jpeg, image/png, image/webp, image/gif |
| Audio | audio/mp3, audio/wav, audio/flac, audio/aac |
| Video | video/mp4, video/mov, video/avi, video/webm |
application/pdf |
File API Management
Manage uploaded files:Streaming with Multimodal Input
You can stream responses for multimodal inputs:Use Cases
Document Analysis
Extract insights from PDFs, images of documents, and scanned files
Video Understanding
Analyze video content, generate descriptions, and answer questions
Audio Transcription
Transcribe and analyze audio content, podcasts, and meetings
Visual Q&A
Answer questions about images, charts, and diagrams
Best Practices
- Use
Part.from_urifor large files or files already in cloud storage - Use
Part.from_bytesfor small files (< 20MB) from local filesystem - Use the File API for files that need preprocessing (video, long audio, PDFs)
- Always specify the correct MIME type for your media
- Check file state (
PROCESSING,ACTIVE) before using uploaded files - Delete files after use to manage storage costs
- Combine multiple modalities when relevant to your use case
- For Gemini Developer API, use the File API for all large files
- For Vertex AI, you can use GCS URIs directly with
Part.from_uri