Vertex AI Model Garden

What is Model Garden?

Vertex AI Model Garden is a curated catalog of open-source and proprietary models that you can discover, evaluate, and deploy on Google Cloud. It provides:

Pre-configured Models: Optimized deployment configurations for popular open models
One-Click Deployment: Simplified deployment process via UI, CLI, or SDK
Hugging Face Integration: Access to over 1 million models from the Hugging Face Hub
Version Management: Track and manage different model versions

Model Garden SDK

The Vertex AI Model Garden SDK provides a model-centric interface for deploying open-source models, removing the need to manage container details and infrastructure complexity.

Installation

pip install --upgrade google-cloud-aiplatform>=1.93.1

Basic Usage

Initialize Vertex AI

import vertexai
from vertexai import model_garden

PROJECT_ID = "your-project-id"
LOCATION = "us-central1"

vertexai.init(project=PROJECT_ID, location=LOCATION)

Discover Models

# List Model Garden models
mg_models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=False
)

# Include Hugging Face models
all_models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=True
)

for model_id in all_models:
    print(f"Available: {model_id}")

Deploy a Model

# Create model instance
model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")

# Deploy to endpoint
endpoint = model.deploy(accept_eula=True)

print(f"Endpoint: {endpoint.resource_name}")

Browsing Models

Model Identifiers

Model Garden uses a hierarchical naming scheme:

publisher/model@version

Example:
google/gemma3@gemma-3-1b-it
meta/[email protected]

Filtering and Search

Find models matching specific criteria:

# Search by name
gemma_models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=True
)

# Search for vision models
vision_models = model_garden.list_deployable_models(
    model_filter="stable-diffusion",
    list_hf_models=True
)

# List all available models
all_models = model_garden.list_deployable_models(list_hf_models=True)
print(f"Total models available: {len(all_models)}")

Deploying Models

Check Deployment Options

Before deploying, review available configurations:

model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")

# List deployment configurations
deploy_options = model.list_deploy_options(concise=True)
print(deploy_options)

Deployment options show the verified machine types, accelerators, and configurations that work best for each model.

Basic Deployment

Deploy with default settings:

model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")
endpoint = model.deploy(accept_eula=True)

Advanced Deployment Configuration

Customize deployment parameters:

endpoint = model.deploy(
    machine_type="g2-standard-12",
    accelerator_type="NVIDIA_L4",
    accelerator_count=2,
    min_replica_count=1,
    max_replica_count=5,
    endpoint_display_name="my-gemma-endpoint",
    model_display_name="gemma-production"
)

Hugging Face Integration

Accessing Public Models

Deploy any public model from Hugging Face Hub:

# Deploy Stable Diffusion
sd_model = model_garden.OpenModel("stabilityai/stable-diffusion-xl-base-1.0")
sd_endpoint = sd_model.deploy(
    machine_type="g2-standard-4",
    accelerator_type="NVIDIA_L4",
    accelerator_count=1
)

Gated Models

Some models require additional authentication:

Accept Model License

Visit the model page on Hugging Face and accept the license terms:

Navigate to https://huggingface.co/MODEL_NAME
Click “Agree and access repository”
Wait for approval (usually instant)

Create Access Token

Generate a Hugging Face access token:

Go to https://huggingface.co/settings/tokens
Click “New token”
Select “Read” access
Copy the token

Deploy with Token

from huggingface_hub import interpreter_login

# Login to Hugging Face
interpreter_login()

# Or provide token directly
model = model_garden.OpenModel("black-forest-labs/FLUX.1-dev")
endpoint = model.deploy(
    hugging_face_access_token="hf_your_token_here"
)

Model Garden CLI

For automation and scripting, use the Model Garden CLI:

# List available models
gcloud ai models list \
  --region=us-central1 \
  --filter="displayName:gemma"

Making Predictions

Using Vertex AI SDK

# Text generation
prediction = endpoint.predict(
    instances=[{
        "prompt": "Explain quantum computing in simple terms",
        "temperature": 0.7,
        "max_tokens": 200,
        "top_p": 0.95
    }]
)

print(prediction.predictions[0])

Using OpenAI SDK

import openai
import google.auth

# Get credentials
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)

# Configure OpenAI client
endpoint_url = f"https://{LOCATION}-aiplatform.googleapis.com/v1beta1/{endpoint.resource_name}"
client = openai.OpenAI(base_url=endpoint_url, api_key=creds.token)

# Generate response
response = client.chat.completions.create(
    model="",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    temperature=0.7,
    max_tokens=50
)

print(response.choices[0].message.content)

Image Generation

For image generation models:

import base64
import io
from PIL import Image

# Generate image
prediction = sd_endpoint.predict(instances=["A serene mountain landscape at sunset"])

# Decode and display
image_bytes = base64.b64decode(prediction.predictions[0])
image = Image.open(io.BytesIO(image_bytes))
image.show()

Error Handling

Common Deployment Errors

Model Not Found
Insufficient Quota
EULA Not Accepted
Organization Policy

try:
    model = model_garden.OpenModel("google/some-model@some-version")
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")
    # Check model name spelling and version

try:
    endpoint = model.deploy(
        machine_type="g2-standard-4",
        accelerator_type="NVIDIA_L4"
    )
except Exception as e:
    if "quota" in str(e).lower():
        print("Request quota increase:")
        print("https://console.cloud.google.com/iam-admin/quotas")

try:
    model = model_garden.OpenModel("meta/[email protected]")
    endpoint = model.deploy()  # Missing accept_eula=True
except Exception as e:
    if "eula" in str(e).lower():
        # Retry with EULA acceptance
        endpoint = model.deploy(accept_eula=True)

If deployment is blocked by organization policies:

# Error: Organization Policy constraint violated
# Contact your administrator to update policies:
# https://cloud.google.com/vertex-ai/generative-ai/docs/control-model-access

Custom Model Import

Import your own models to Model Garden:

from google.cloud import aiplatform

# Upload custom model
model = aiplatform.Model.upload(
    display_name="my-custom-model",
    artifact_uri="gs://my-bucket/model-artifacts/",
    serving_container_image_uri=f"{LOCATION}-docker.pkg.dev/{PROJECT_ID}/repo/custom-image:latest",
    serving_container_environment_variables={
        "MODEL_ID": "custom-model-v1"
    }
)

# Deploy custom model
endpoint = model.deploy(
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=3
)

Best Practices

Check Configurations

Always review list_deploy_options() before deploying to verify resource requirements

Start Small

Begin with smaller models and scale up as needed

Use Autoscaling

Configure min/max replicas to handle traffic spikes efficiently

Monitor Costs

Set up billing alerts and use spot VMs for non-critical workloads

Version Control

Track model versions in deployment names for easier management

Test Thoroughly

Validate model outputs before promoting to production

Next Steps

Fine-Tune Models

Customize models for your specific use cases

Optimize Serving

Learn about inference optimization with vLLM and TGI

Example Notebooks

Explore deployment examples on GitHub

Model Garden Console

Browse models in the Cloud Console

Evaluation & Testing

Production Deployment

Open Models

What is Model Garden?

Model Garden SDK

Installation

Basic Usage

Browsing Models

Model Identifiers

Filtering and Search

Deploying Models

Check Deployment Options

Basic Deployment

Advanced Deployment Configuration

Hugging Face Integration

Accessing Public Models

Gated Models

Model Garden CLI

Making Predictions

Using Vertex AI SDK

Using OpenAI SDK

Image Generation

Error Handling

Common Deployment Errors

Custom Model Import

Best Practices

Check Configurations

Start Small

Use Autoscaling

Monitor Costs

Version Control

Test Thoroughly

Next Steps

Fine-Tune Models

Optimize Serving

Example Notebooks

Model Garden Console

Build docs developers (and LLMs) love

Evaluation & Testing

Production Deployment

Open Models

​What is Model Garden?

​Model Garden SDK

​Installation

​Basic Usage

​Browsing Models

​Model Identifiers

​Filtering and Search

​Deploying Models

​Check Deployment Options

​Basic Deployment

​Advanced Deployment Configuration

​Hugging Face Integration

​Accessing Public Models

​Gated Models

​Model Garden CLI

​Making Predictions

​Using Vertex AI SDK

​Using OpenAI SDK

​Image Generation

​Error Handling

​Common Deployment Errors

​Custom Model Import

​Best Practices

Check Configurations

Start Small

Use Autoscaling

Monitor Costs

Version Control

Test Thoroughly

​Next Steps

Fine-Tune Models

Optimize Serving

Example Notebooks

Model Garden Console

Build docs developers (and LLMs) love

What is Model Garden?

Model Garden SDK

Installation

Basic Usage

Browsing Models

Model Identifiers

Filtering and Search

Deploying Models

Check Deployment Options

Basic Deployment

Advanced Deployment Configuration

Hugging Face Integration

Accessing Public Models

Gated Models

Model Garden CLI

Making Predictions

Using Vertex AI SDK

Using OpenAI SDK

Image Generation

Error Handling

Common Deployment Errors

Custom Model Import

Best Practices

Next Steps