Skip to main content

Introduction

Vertex AI provides comprehensive support for deploying and managing open-source models at scale. Whether you’re working with language models, image generation models, or custom architectures, Google Cloud offers the infrastructure and tools to deploy, fine-tune, and serve these models efficiently.

Open Source Model Ecosystem

Vertex AI Model Garden serves as your gateway to a vast ecosystem of open-source models:

Model Garden

Browse and deploy pre-configured open models from Vertex AI Model Garden

Hugging Face Hub

Access over 1 million models from the Hugging Face Hub

Fine-Tuning

Customize models for your specific use cases

Optimized Serving

Deploy models with vLLM, TGI, and other inference engines

Key Capabilities

Model Discovery and Deployment

Vertex AI Model Garden SDK simplifies discovering and deploying open models:
from vertexai import model_garden
import vertexai

# Initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=LOCATION)

# List available models
models = model_garden.list_deployable_models(
    model_filter="gemma",
    list_hf_models=True  # Include Hugging Face models
)

# Deploy a model
model = model_garden.OpenModel("google/gemma3@gemma-3-1b-it")
endpoint = model.deploy(accept_eula=True)

Supported Model Types

Vertex AI Model Garden supports various model architectures:
  • Gemma: Google’s lightweight, state-of-the-art open models
  • Llama: Meta’s family of large language models
  • DeepSeek: Advanced reasoning and instruction models
  • Qwen: Multilingual language models
  • Mistral: Efficient and powerful language models

Deployment Options

Vertex AI Endpoints

Deploy models to managed endpoints with automatic scaling:
endpoint = model.deploy(
    machine_type="g2-standard-4",
    accelerator_type="NVIDIA_L4",
    accelerator_count=1,
    min_replica_count=1,
    max_replica_count=5
)

Cloud Run

Deploy lightweight models on Cloud Run for serverless inference:
  • Pay only for actual usage
  • Automatic scaling to zero
  • Integrated with Cloud Load Balancing

GKE (Google Kubernetes Engine)

Deploy models on GKE for advanced orchestration:
  • Full control over infrastructure
  • Custom autoscaling policies
  • Multi-region deployments

Integration with Google Cloud Services

1

BigQuery ML

Use open models directly in BigQuery for SQL-based inference:
SELECT ml_generate_text_llm_result
FROM ML.GENERATE_TEXT(
  MODEL `project.dataset.llama_model`,
  (SELECT "Explain quantum computing" AS prompt)
)
2

Vertex AI Pipelines

Orchestrate model training, evaluation, and deployment workflows
3

Cloud Storage

Store model artifacts, training data, and inference results
4

Vertex AI Experiments

Track fine-tuning experiments and compare model performance

Model Access and Authentication

Gated Models

Some models require accepting terms or providing authentication:
# Accept End User License Agreement
endpoint = model.deploy(accept_eula=True)

Organization Policies

Control which models can be deployed in your organization:
# Set allowed models policy
gcloud org-policies set-policy policy.yaml
Contact your organization administrator if you encounter policy constraint violations when deploying models.

Inference APIs

Vertex AI provides multiple APIs for model inference:
prediction = endpoint.predict(
    instances=[{
        "prompt": "Tell me a joke",
        "temperature": 0.7,
        "max_tokens": 50
    }]
)
print(prediction.predictions[0])

Cost Optimization

Compute Options

Standard VMs

Predictable pricing for steady workloads

Spot VMs

Up to 80% cost savings for fault-tolerant workloads

Reserved Resources

Committed use discounts for long-running deployments

Autoscaling

Scale replicas based on traffic patterns

GPU Selection

Choose the right GPU for your workload:
GPU TypeBest ForMemory
NVIDIA L4Cost-effective inference24 GB
NVIDIA T4Balanced workloads16 GB
NVIDIA A100Training & large models40-80 GB
NVIDIA H100Highest performance80 GB

Next Steps

Explore Model Garden

Browse and deploy models from the catalog

Fine-Tune Models

Customize models for your use cases

Optimize Serving

Learn about inference optimization techniques

View Examples

Explore example notebooks on GitHub

Build docs developers (and LLMs) love