Introduction
Vertex AI provides comprehensive support for deploying and managing open-source models at scale. Whether you’re working with language models, image generation models, or custom architectures, Google Cloud offers the infrastructure and tools to deploy, fine-tune, and serve these models efficiently.Open Source Model Ecosystem
Vertex AI Model Garden serves as your gateway to a vast ecosystem of open-source models:Model Garden
Browse and deploy pre-configured open models from Vertex AI Model Garden
Hugging Face Hub
Access over 1 million models from the Hugging Face Hub
Fine-Tuning
Customize models for your specific use cases
Optimized Serving
Deploy models with vLLM, TGI, and other inference engines
Key Capabilities
Model Discovery and Deployment
Vertex AI Model Garden SDK simplifies discovering and deploying open models:Supported Model Types
Vertex AI Model Garden supports various model architectures:- Language Models
- Vision Models
- Specialized Models
- Gemma: Google’s lightweight, state-of-the-art open models
- Llama: Meta’s family of large language models
- DeepSeek: Advanced reasoning and instruction models
- Qwen: Multilingual language models
- Mistral: Efficient and powerful language models
Deployment Options
Vertex AI Endpoints
Deploy models to managed endpoints with automatic scaling:Cloud Run
Deploy lightweight models on Cloud Run for serverless inference:- Pay only for actual usage
- Automatic scaling to zero
- Integrated with Cloud Load Balancing
GKE (Google Kubernetes Engine)
Deploy models on GKE for advanced orchestration:- Full control over infrastructure
- Custom autoscaling policies
- Multi-region deployments
Integration with Google Cloud Services
Model Access and Authentication
Gated Models
Some models require accepting terms or providing authentication:Organization Policies
Control which models can be deployed in your organization:Contact your organization administrator if you encounter policy constraint violations when deploying models.
Inference APIs
Vertex AI provides multiple APIs for model inference:- Vertex AI SDK
- OpenAI-Compatible API
- REST API
Cost Optimization
Compute Options
Standard VMs
Predictable pricing for steady workloads
Spot VMs
Up to 80% cost savings for fault-tolerant workloads
Reserved Resources
Committed use discounts for long-running deployments
Autoscaling
Scale replicas based on traffic patterns
GPU Selection
Choose the right GPU for your workload:| GPU Type | Best For | Memory |
|---|---|---|
| NVIDIA L4 | Cost-effective inference | 24 GB |
| NVIDIA T4 | Balanced workloads | 16 GB |
| NVIDIA A100 | Training & large models | 40-80 GB |
| NVIDIA H100 | Highest performance | 80 GB |
Next Steps
Explore Model Garden
Browse and deploy models from the catalog
Fine-Tune Models
Customize models for your use cases
Optimize Serving
Learn about inference optimization techniques
View Examples
Explore example notebooks on GitHub