Installation
kfp>=2.6.0- Kubeflow Pipelines SDK (used by Vertex AI)google-cloud-aiplatform>=1.34.0- Vertex AI SDKgoogle-cloud-storage>=2.9.0- Google Cloud Storagegoogle-cloud-secret-manager- Secret managementgcsfs- GCS filesystem interfacekubernetes- Kubernetes Python client
Available Components
The GCP integration provides these stack components:Vertex AI Orchestrator
Execute pipelines using Google Cloud Vertex AI Pipelines
Vertex AI Step Operator
Run individual steps on Vertex AI custom jobs
GCS Artifact Store
Store artifacts in Google Cloud Storage buckets
Vertex Experiment Tracker
Track experiments in Vertex AI Experiments
GCP Image Builder
Build container images using Google Cloud Build
Authentication
There are three ways to authenticate with GCP:1. Service Connector (Recommended)
2. Explicit Service Account
3. Application Default Credentials
If no credentials are provided, ZenML uses Application Default Credentials:- Environment variable
GOOGLE_APPLICATION_CREDENTIALS - gcloud CLI credentials (
gcloud auth application-default login) - GCE/GKE metadata server (when running on Google Cloud)
Vertex AI Orchestrator
The Vertex AI orchestrator runs your complete pipeline as a Vertex AI Pipeline.Configuration
project- GCP project IDlocation- GCP region (e.g.,us-central1,europe-west1)
pipeline_root- GCS URI for pipeline artifacts (defaults to artifact store path if using GCS)workload_service_account- Service account for pipeline executionnetwork- VPC network for private connectivityencryption_spec_key_name- Cloud KMS key for encryption
Service Account Permissions
The service account needs these IAM roles:aiplatform.customJobs.createaiplatform.pipelineJobs.createstorage.objects.get/create/deleteartifactregistry.repositories.downloadArtifacts
Step-Level Settings
Customize individual steps with Vertex AI-specific settings:pod_settings- Kubernetes Pod configuration (resources, node selectors, tolerations)labels- GCP labels for the pipeline jobsynchronous- Wait for pipeline completion (default: True)node_selector_constraint- Tuple of (key, value) for node selection (deprecated, use pod_settings)
Machine Types and GPUs
Vertex AI supports various machine types and accelerators:| Machine Family | vCPUs | Memory | Use Case |
|---|---|---|---|
| n1-standard-4 | 4 | 15 GB | Standard workloads |
| n1-standard-8 | 8 | 30 GB | Medium workloads |
| n1-highmem-8 | 8 | 52 GB | Memory-intensive |
| n1-highcpu-16 | 16 | 14.4 GB | CPU-intensive |
NVIDIA_TESLA_K80- Legacy, cheapNVIDIA_TESLA_T4- Good price/performanceNVIDIA_TESLA_V100- High performanceNVIDIA_TESLA_P4- Inference optimizedNVIDIA_TESLA_A100- Latest, most powerful
Vertex AI Step Operator
The step operator runs individual steps as Vertex AI custom jobs.Configuration
Usage
GCS Artifact Store
Store artifacts in Google Cloud Storage buckets.Configuration
gs://.
Bucket Permissions
Ensure the service account has access:Vertex Experiment Tracker
Track experiments using Vertex AI Experiments.Configuration
Usage
Complete Stack Example
Here’s a complete production-ready GCP stack:Best Practices
Use Workload Identity on GKE
Use Workload Identity on GKE
When running ZenML from GKE, use Workload Identity instead of service account keys:
Use Private GKE Clusters
Use Private GKE Clusters
For better security, use private GKE clusters with Private Service Connect:
Enable Encryption
Enable Encryption
Use customer-managed encryption keys (CMEK) for data at rest:
Label Resources
Label Resources
Use labels for cost tracking and organization:
Common Issues
Permission Denied Errors
Permission Denied Errors
If you see permission errors, verify:
- Service account has required IAM roles
- API is enabled (
gcloud services enable aiplatform.googleapis.com) - GCS bucket policy allows access
- Artifact Registry permissions are correct
GPU Not Available
GPU Not Available
If GPU allocation fails:
- Check GPU availability in your region
- Request quota increase in IAM & Admin > Quotas
- Verify node selector matches available GPU types
- Try a different region
Pipeline Upload Fails
Pipeline Upload Fails
If pipeline compilation/upload fails:
- Check
pipeline_rootis a valid GCS path - Verify service account can write to GCS bucket
- Ensure KFP version compatibility
- Check ZenML and integration versions match
Next Steps
Vertex AI Documentation
Detailed Vertex AI orchestrator guide
GCS Artifact Store
Configure GCS for artifact storage
Service Connectors
Advanced authentication options
Remote Execution
Production deployment patterns
