Skip to main content

Introduction to Vision AI

Google Cloud offers powerful generative AI capabilities for creating and editing images through two primary model families:

Imagen

Google’s state-of-the-art text-to-image model for high-quality image generation

Gemini Image Generation

Multimodal conversational image generation and editing with Gemini models

Imagen on Vertex AI

Imagen brings Google’s state-of-the-art generative AI capabilities to application developers. The Imagen family includes:
  • Imagen 4: Google’s highest quality text-to-image model with exceptional detail, improved prompt adherence, and advanced text rendering
  • Imagen 4 Fast: Optimized for lower latency with brighter images and higher contrast
  • Imagen 4 Ultra: Exceptional quality with enhanced photorealism
  • Imagen 3: Previous generation with robust editing and customization capabilities

Key Capabilities

Text-to-Image Generation

Generate high-quality images from natural language descriptions

Image Editing

Modify existing images with inpainting, outpainting, and mask-based editing

Text Rendering

Accurately render text within images for posters, comics, and logos

Multilingual Support

Process prompts in 10+ languages including English, Spanish, Japanese, and more

Gemini Image Generation

Gemini 2.5 Flash Image (also known as “Nano Banana”) is a powerful multimodal model that combines conversational AI with image generation and editing capabilities.

Unique Features

  • Conversational Image Editing: Iteratively refine images through natural conversation
  • Interleaved Content: Generate sequences mixing text and images (e.g., step-by-step tutorials)
  • Multi-Reference Editing: Combine multiple input images to create new compositions
  • Subject Customization: Apply specific subjects or styles from reference images
Gemini 2.5 Flash Image is ideal for applications requiring iterative design workflows, interactive image creation, or tightly integrated text and visual content.

Getting Started

1

Set up your environment

Install the Google Gen AI SDK for Python:
pip install --upgrade google-genai
2

Configure authentication

Set up your Google Cloud project and enable the Vertex AI API:
from google import genai
from google.genai import types

PROJECT_ID = "your-project-id"
LOCATION = "us-central1"

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
3

Generate your first image

Start creating images with a simple text prompt:
# Using Imagen 4
image = client.models.generate_images(
    model="imagen-4.0-generate-001",
    prompt="a serene mountain landscape at sunset",
    config=types.GenerateImagesConfig(
        aspect_ratio="16:9",
        number_of_images=1,
    ),
)

Model Selection Guide

ModelBest ForLatencyQuality
Imagen 4High-quality images with text renderingMediumHighest
Imagen 4 FastQuick iterations, prototypingLowHigh
Imagen 4 UltraProfessional-grade photorealismHighExceptional
Gemini 2.5 Flash ImageConversational editing, tutorialsMediumHigh

Safety and Watermarking

All images generated with these models include:
  • SynthID Watermark: Digital watermark for identifying AI-generated content
  • Safety Filters: Configurable content filtering to block inappropriate content
  • Person Generation Controls: Fine-grained control over generating images with people
Always configure appropriate safety filters for your application’s use case. Available levels include BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH, and BLOCK_NONE.

Learn More

Image Generation

Text-to-image generation techniques and prompt engineering

Image Editing

Inpainting, outpainting, and advanced editing workflows

Visual Q&A

Image understanding and visual question answering

Supported Locations

Imagen and Gemini Image models are available in multiple regions. For the most up-to-date list of supported locations, see the Vertex AI locations documentation.

Build docs developers (and LLMs) love