Skip to main content

What is Model Garden?

Vertex AI Model Garden is a curated catalog of open-source and proprietary models that you can discover, evaluate, and deploy on Google Cloud. It provides:
  • Pre-configured Models: Optimized deployment configurations for popular open models
  • One-Click Deployment: Simplified deployment process via UI, CLI, or SDK
  • Hugging Face Integration: Access to over 1 million models from the Hugging Face Hub
  • Version Management: Track and manage different model versions

Model Garden SDK

The Vertex AI Model Garden SDK provides a model-centric interface for deploying open-source models, removing the need to manage container details and infrastructure complexity.

Installation

pip install --upgrade google-cloud-aiplatform>=1.93.1

Basic Usage

1

Initialize Vertex AI

import vertexai
from vertexai import model_garden

PROJECT_ID = "your-project-id"
LOCATION = "us-central1"

vertexai.init(project=PROJECT_ID, location=LOCATION)
2

Discover Models

# List Model Garden models
mg_models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=False
)

# Include Hugging Face models
all_models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=True
)

for model_id in all_models:
    print(f"Available: {model_id}")
3

Deploy a Model

# Create model instance
model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")

# Deploy to endpoint
endpoint = model.deploy(accept_eula=True)

print(f"Endpoint: {endpoint.resource_name}")

Browsing Models

Model Identifiers

Model Garden uses a hierarchical naming scheme:
publisher/model@version

Example:
google/gemma3@gemma-3-1b-it
meta/[email protected]
Find models matching specific criteria:
# Search by name
gemma_models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=True
)

# Search for vision models
vision_models = model_garden.list_deployable_models(
    model_filter="stable-diffusion",
    list_hf_models=True
)

# List all available models
all_models = model_garden.list_deployable_models(list_hf_models=True)
print(f"Total models available: {len(all_models)}")

Deploying Models

Check Deployment Options

Before deploying, review available configurations:
model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")

# List deployment configurations
deploy_options = model.list_deploy_options(concise=True)
print(deploy_options)
Deployment options show the verified machine types, accelerators, and configurations that work best for each model.

Basic Deployment

Deploy with default settings:
model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")
endpoint = model.deploy(accept_eula=True)

Advanced Deployment Configuration

Customize deployment parameters:
endpoint = model.deploy(
    machine_type="g2-standard-12",
    accelerator_type="NVIDIA_L4",
    accelerator_count=2,
    min_replica_count=1,
    max_replica_count=5,
    endpoint_display_name="my-gemma-endpoint",
    model_display_name="gemma-production"
)

Hugging Face Integration

Accessing Public Models

Deploy any public model from Hugging Face Hub:
# Deploy Stable Diffusion
sd_model = model_garden.OpenModel("stabilityai/stable-diffusion-xl-base-1.0")
sd_endpoint = sd_model.deploy(
    machine_type="g2-standard-4",
    accelerator_type="NVIDIA_L4",
    accelerator_count=1
)

Gated Models

Some models require additional authentication:
1

Accept Model License

Visit the model page on Hugging Face and accept the license terms:
2

Create Access Token

Generate a Hugging Face access token:
  1. Go to https://huggingface.co/settings/tokens
  2. Click “New token”
  3. Select “Read” access
  4. Copy the token
3

Deploy with Token

from huggingface_hub import interpreter_login

# Login to Hugging Face
interpreter_login()

# Or provide token directly
model = model_garden.OpenModel("black-forest-labs/FLUX.1-dev")
endpoint = model.deploy(
    hugging_face_access_token="hf_your_token_here"
)

Model Garden CLI

For automation and scripting, use the Model Garden CLI:
# List available models
gcloud ai models list \
  --region=us-central1 \
  --filter="displayName:gemma"

Making Predictions

Using Vertex AI SDK

# Text generation
prediction = endpoint.predict(
    instances=[{
        "prompt": "Explain quantum computing in simple terms",
        "temperature": 0.7,
        "max_tokens": 200,
        "top_p": 0.95
    }]
)

print(prediction.predictions[0])

Using OpenAI SDK

import openai
import google.auth

# Get credentials
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)

# Configure OpenAI client
endpoint_url = f"https://{LOCATION}-aiplatform.googleapis.com/v1beta1/{endpoint.resource_name}"
client = openai.OpenAI(base_url=endpoint_url, api_key=creds.token)

# Generate response
response = client.chat.completions.create(
    model="",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    temperature=0.7,
    max_tokens=50
)

print(response.choices[0].message.content)

Image Generation

For image generation models:
import base64
import io
from PIL import Image

# Generate image
prediction = sd_endpoint.predict(instances=["A serene mountain landscape at sunset"])

# Decode and display
image_bytes = base64.b64decode(prediction.predictions[0])
image = Image.open(io.BytesIO(image_bytes))
image.show()

Error Handling

Common Deployment Errors

try:
    model = model_garden.OpenModel("google/some-model@some-version")
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")
    # Check model name spelling and version

Custom Model Import

Import your own models to Model Garden:
from google.cloud import aiplatform

# Upload custom model
model = aiplatform.Model.upload(
    display_name="my-custom-model",
    artifact_uri="gs://my-bucket/model-artifacts/",
    serving_container_image_uri=f"{LOCATION}-docker.pkg.dev/{PROJECT_ID}/repo/custom-image:latest",
    serving_container_environment_variables={
        "MODEL_ID": "custom-model-v1"
    }
)

# Deploy custom model
endpoint = model.deploy(
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=3
)

Best Practices

Check Configurations

Always review list_deploy_options() before deploying to verify resource requirements

Start Small

Begin with smaller models and scale up as needed

Use Autoscaling

Configure min/max replicas to handle traffic spikes efficiently

Monitor Costs

Set up billing alerts and use spot VMs for non-critical workloads

Version Control

Track model versions in deployment names for easier management

Test Thoroughly

Validate model outputs before promoting to production

Next Steps

Fine-Tune Models

Customize models for your specific use cases

Optimize Serving

Learn about inference optimization with vLLM and TGI

Example Notebooks

Explore deployment examples on GitHub

Model Garden Console

Browse models in the Cloud Console

Build docs developers (and LLMs) love