Introduction to Vision AI
Google Cloud offers powerful generative AI capabilities for creating and editing images through two primary model families:Imagen
Google’s state-of-the-art text-to-image model for high-quality image generation
Gemini Image Generation
Multimodal conversational image generation and editing with Gemini models
Imagen on Vertex AI
Imagen brings Google’s state-of-the-art generative AI capabilities to application developers. The Imagen family includes:- Imagen 4: Google’s highest quality text-to-image model with exceptional detail, improved prompt adherence, and advanced text rendering
- Imagen 4 Fast: Optimized for lower latency with brighter images and higher contrast
- Imagen 4 Ultra: Exceptional quality with enhanced photorealism
- Imagen 3: Previous generation with robust editing and customization capabilities
Key Capabilities
Text-to-Image Generation
Generate high-quality images from natural language descriptions
Image Editing
Modify existing images with inpainting, outpainting, and mask-based editing
Text Rendering
Accurately render text within images for posters, comics, and logos
Multilingual Support
Process prompts in 10+ languages including English, Spanish, Japanese, and more
Gemini Image Generation
Gemini 2.5 Flash Image (also known as “Nano Banana”) is a powerful multimodal model that combines conversational AI with image generation and editing capabilities.Unique Features
- Conversational Image Editing: Iteratively refine images through natural conversation
- Interleaved Content: Generate sequences mixing text and images (e.g., step-by-step tutorials)
- Multi-Reference Editing: Combine multiple input images to create new compositions
- Subject Customization: Apply specific subjects or styles from reference images
Gemini 2.5 Flash Image is ideal for applications requiring iterative design workflows, interactive image creation, or tightly integrated text and visual content.
Getting Started
Model Selection Guide
| Model | Best For | Latency | Quality |
|---|---|---|---|
| Imagen 4 | High-quality images with text rendering | Medium | Highest |
| Imagen 4 Fast | Quick iterations, prototyping | Low | High |
| Imagen 4 Ultra | Professional-grade photorealism | High | Exceptional |
| Gemini 2.5 Flash Image | Conversational editing, tutorials | Medium | High |
Safety and Watermarking
All images generated with these models include:- SynthID Watermark: Digital watermark for identifying AI-generated content
- Safety Filters: Configurable content filtering to block inappropriate content
- Person Generation Controls: Fine-grained control over generating images with people
Learn More
Image Generation
Text-to-image generation techniques and prompt engineering
Image Editing
Inpainting, outpainting, and advanced editing workflows
Visual Q&A
Image understanding and visual question answering