Google Cloud Integration

The GCP integration provides comprehensive support for running ZenML pipelines on Google Cloud Platform, including Vertex AI orchestration, GCS artifact storage, and container image management.

Installation

pip install "zenml[gcp]"

This installs the following key packages:

kfp>=2.6.0 - Kubeflow Pipelines SDK (used by Vertex AI)
google-cloud-aiplatform>=1.34.0 - Vertex AI SDK
google-cloud-storage>=2.9.0 - Google Cloud Storage
google-cloud-secret-manager - Secret management
gcsfs - GCS filesystem interface
kubernetes - Kubernetes Python client

Available Components

The GCP integration provides these stack components:

Vertex AI Orchestrator

Execute pipelines using Google Cloud Vertex AI Pipelines

Vertex AI Step Operator

Run individual steps on Vertex AI custom jobs

GCS Artifact Store

Store artifacts in Google Cloud Storage buckets

Vertex Experiment Tracker

Track experiments in Vertex AI Experiments

GCP Image Builder

Build container images using Google Cloud Build

Authentication

There are three ways to authenticate with GCP:

1. Service Connector (Recommended)

from zenml.client import Client

Client().create_service_connector(
    name="gcp-connector",
    type="gcp",
    auth_method="service-account",
    configuration={
        "service_account_json": '{"type": "service_account", ...}',
        "project_id": "my-gcp-project",
    },
)

2. Explicit Service Account

zenml orchestrator register vertex-orch \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1 \
    --service_account_path=/path/to/service-account.json

3. Application Default Credentials

If no credentials are provided, ZenML uses Application Default Credentials:

Environment variable GOOGLE_APPLICATION_CREDENTIALS
gcloud CLI credentials (gcloud auth application-default login)
GCE/GKE metadata server (when running on Google Cloud)

Vertex AI Orchestrator

The Vertex AI orchestrator runs your complete pipeline as a Vertex AI Pipeline.

Configuration

zenml orchestrator register vertex-orch \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1 \
    --pipeline_root=gs://my-vertex-bucket/pipelines

Required Parameters:

project - GCP project ID
location - GCP region (e.g., us-central1, europe-west1)

Optional Parameters:

pipeline_root - GCS URI for pipeline artifacts (defaults to artifact store path if using GCS)
workload_service_account - Service account for pipeline execution
network - VPC network for private connectivity
encryption_spec_key_name - Cloud KMS key for encryption

Service Account Permissions

The service account needs these IAM roles:

# Grant required roles
gcloud projects add-iam-policy-binding my-gcp-project \
    --member=serviceAccount:[email protected] \
    --role=roles/aiplatform.user

gcloud projects add-iam-policy-binding my-gcp-project \
    --member=serviceAccount:[email protected] \
    --role=roles/storage.objectAdmin

gcloud projects add-iam-policy-binding my-gcp-project \
    --member=serviceAccount:[email protected] \
    --role=roles/artifactregistry.reader

Required permissions:

aiplatform.customJobs.create
aiplatform.pipelineJobs.create
storage.objects.get/create/delete
artifactregistry.repositories.downloadArtifacts

Step-Level Settings

Customize individual steps with Vertex AI-specific settings:

from zenml import step, pipeline
from zenml.integrations.gcp.flavors.vertex_orchestrator_flavor import (
    VertexOrchestratorSettings,
)
from zenml.integrations.kubernetes.pod_settings import KubernetesPodSettings

@step(
    settings={
        "orchestrator": VertexOrchestratorSettings(
            pod_settings=KubernetesPodSettings(
                node_selectors={
                    "cloud.google.com/gke-accelerator": "NVIDIA_TESLA_T4"
                },
                resources={
                    "requests": {"memory": "16Gi", "cpu": "4"},
                    "limits": {"memory": "16Gi", "cpu": "4", "nvidia.com/gpu": "1"},
                },
            ),
            labels={"team": "ml-ops", "project": "recommendation"},
        )
    }
)
def train_on_gpu(data: pd.DataFrame) -> Model:
    # Training code runs on GPU
    ...

@step(
    settings={
        "orchestrator": VertexOrchestratorSettings(
            pod_settings=KubernetesPodSettings(
                resources={
                    "requests": {"memory": "4Gi", "cpu": "2"},
                },
            ),
        )
    }
)
def preprocess_data() -> pd.DataFrame:
    ...

@pipeline
def training_pipeline():
    data = preprocess_data()
    train_on_gpu(data)

Available Settings:

pod_settings - Kubernetes Pod configuration (resources, node selectors, tolerations)
labels - GCP labels for the pipeline job
synchronous - Wait for pipeline completion (default: True)
node_selector_constraint - Tuple of (key, value) for node selection (deprecated, use pod_settings)

Machine Types and GPUs

Vertex AI supports various machine types and accelerators:

Machine Family	vCPUs	Memory	Use Case
n1-standard-4	4	15 GB	Standard workloads
n1-standard-8	8	30 GB	Medium workloads
n1-highmem-8	8	52 GB	Memory-intensive
n1-highcpu-16	16	14.4 GB	CPU-intensive

GPU Accelerators:

NVIDIA_TESLA_K80 - Legacy, cheap
NVIDIA_TESLA_T4 - Good price/performance
NVIDIA_TESLA_V100 - High performance
NVIDIA_TESLA_P4 - Inference optimized
NVIDIA_TESLA_A100 - Latest, most powerful

Check GPU availability by region.

Vertex AI Step Operator

The step operator runs individual steps as Vertex AI custom jobs.

Configuration

zenml step-operator register vertex-step-op \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1 \
    --service_account=vertex-sa@my-gcp-project.iam.gserviceaccount.com

Usage

from zenml import step, pipeline

@step(step_operator="vertex-step-op")
def train_on_vertex(data: pd.DataFrame) -> Model:
    # This step runs on Vertex AI
    ...

@step
def preprocess_locally(raw_data: pd.DataFrame) -> pd.DataFrame:
    # This step runs locally or on local orchestrator
    ...

@pipeline
def hybrid_pipeline():
    data = preprocess_locally(...)  # Runs locally
    model = train_on_vertex(data)  # Runs on Vertex AI

GCS Artifact Store

Store artifacts in Google Cloud Storage buckets.

Configuration

zenml artifact-store register gcs-store \
    --flavor=gcp \
    --path=gs://my-zenml-artifacts

The path must be a valid GCS URI starting with gs://.

Bucket Permissions

Ensure the service account has access:

# Grant storage permissions
gsutil iam ch \
    serviceAccount:[email protected]:roles/storage.objectAdmin \
    gs://my-zenml-artifacts

Vertex Experiment Tracker

Track experiments using Vertex AI Experiments.

Configuration

zenml experiment-tracker register vertex-experiments \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1

Usage

from zenml import step, pipeline
from zenml.client import Client

experiment_tracker = Client().active_stack.experiment_tracker

@step(experiment_tracker="vertex-experiments")
def train_model(data: pd.DataFrame) -> Model:
    # Log metrics to Vertex AI Experiments
    experiment_tracker.log_params({"learning_rate": 0.001, "epochs": 10})
    
    # Training loop
    for epoch in range(10):
        loss = train_epoch(...)
        experiment_tracker.log_metrics({"loss": loss}, step=epoch)
    
    return model

Complete Stack Example

Here’s a complete production-ready GCP stack:

# Create service connector
zenml service-connector register gcp-prod \
    --type=gcp \
    --auth_method=service-account \
    --service_account_json=@/path/to/service-account.json \
    --project_id=my-gcp-project

# Register components
zenml orchestrator register vertex-prod \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1 \
    --pipeline_root=gs://my-vertex-pipelines \
    --workload_service_account=vertex-sa@my-gcp-project.iam.gserviceaccount.com

zenml artifact-store register gcs-prod \
    --flavor=gcp \
    --path=gs://my-zenml-artifacts

zenml container-registry register gcr-prod \
    --flavor=gcp \
    --uri=gcr.io/my-gcp-project

zenml experiment-tracker register vertex-exp-prod \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1

# Create stack
zenml stack register gcp-prod \
    -o vertex-prod \
    -a gcs-prod \
    -c gcr-prod \
    -e vertex-exp-prod

# Activate stack
zenml stack set gcp-prod

Best Practices

Use Workload Identity on GKE

When running ZenML from GKE, use Workload Identity instead of service account keys:

# Link Kubernetes service account to GCP service account
gcloud iam service-accounts add-iam-policy-binding \
    [email protected] \
    --role=roles/iam.workloadIdentityUser \
    --member="serviceAccount:my-gcp-project.svc.id.goog[default/zenml]"

Use Private GKE Clusters

For better security, use private GKE clusters with Private Service Connect:

zenml orchestrator register vertex-orch \
    --flavor=vertex \
    --project=my-gcp-project \
    --location=us-central1 \
    --network=projects/my-gcp-project/global/networks/my-vpc \
    --private_service_connect=projects/my-gcp-project/regions/us-central1/networkAttachments/my-psc

Enable Encryption

Use customer-managed encryption keys (CMEK) for data at rest:

zenml orchestrator register vertex-orch \
    --flavor=vertex \
    --encryption_spec_key_name=projects/my-gcp-project/locations/us-central1/keyRings/my-keyring/cryptoKeys/my-key

Label Resources

Use labels for cost tracking and organization:

@step(
    settings={
        "orchestrator": VertexOrchestratorSettings(
            labels={
                "environment": "production",
                "team": "ml-ops",
                "cost-center": "engineering",
            }
        )
    }
)
def train_model():
    ...

Common Issues

Permission Denied Errors

If you see permission errors, verify:

Service account has required IAM roles
API is enabled (gcloud services enable aiplatform.googleapis.com)
GCS bucket policy allows access
Artifact Registry permissions are correct

GPU Not Available

If GPU allocation fails:

Check GPU availability in your region
Request quota increase in IAM & Admin > Quotas
Verify node selector matches available GPU types
Try a different region

Pipeline Upload Fails

If pipeline compilation/upload fails:

Check pipeline_root is a valid GCS path
Verify service account can write to GCS bucket
Ensure KFP version compatibility
Check ZenML and integration versions match

Next Steps

Vertex AI Documentation

Detailed Vertex AI orchestrator guide

GCS Artifact Store

Configure GCS for artifact storage

Service Connectors

Advanced authentication options

Remote Execution

Production deployment patterns

Getting Started

Core Concepts

Guides

Stack Components

Integrations

Advanced

Deployment

​Installation

​Available Components

Vertex AI Orchestrator

Vertex AI Step Operator

GCS Artifact Store

Vertex Experiment Tracker

GCP Image Builder

​Authentication

​1. Service Connector (Recommended)

​2. Explicit Service Account

​3. Application Default Credentials

​Vertex AI Orchestrator

​Configuration

​Service Account Permissions

​Step-Level Settings

​Machine Types and GPUs

​Vertex AI Step Operator

​Configuration

​Usage

​GCS Artifact Store

​Configuration

​Bucket Permissions

​Vertex Experiment Tracker

​Configuration

​Usage

​Complete Stack Example

​Best Practices

​Common Issues

​Next Steps

Vertex AI Documentation

GCS Artifact Store

Service Connectors

Remote Execution

Build docs developers (and LLMs) love

Installation

Available Components

Authentication

1. Service Connector (Recommended)

2. Explicit Service Account

3. Application Default Credentials

Vertex AI Orchestrator

Configuration

Service Account Permissions

Step-Level Settings

Machine Types and GPUs

Vertex AI Step Operator

Configuration

Usage

GCS Artifact Store

Configuration

Bucket Permissions

Vertex Experiment Tracker

Configuration

Usage

Complete Stack Example

Best Practices

Common Issues

Next Steps