Google Vertex AI Setup

Google Vertex AI provides access to Google’s Gemini models and third-party models like Claude through Google Cloud Platform. It offers enterprise-grade reliability, security, and integration with GCP services.

Available Models

Gemini 3 Series (Latest)

gemini-3.1-pro-preview - Most capable with advanced reasoning (1M context)
gemini-3-pro-preview - Advanced reasoning and thinking
gemini-3-flash-preview - Fast with thinking support

Gemini 2 Series

gemini-2.5-pro - Most capable Gemini 2.5 (1M context)
gemini-2.5-flash - Fast and efficient (1M context)
gemini-2.0-flash - Fast and versatile (1M context)

Gemini 1.5 Series

gemini-1.5-pro - Capable and reliable (1M context)
gemini-1.5-flash - Fast and efficient (1M context)
gemini-1.5-flash-8b - Compact and efficient (1M context)

All Gemini models support:

Massive 1M token context windows
Multimodal input (text + images)
Tool calling and parallel execution
Thinking/reasoning (Gemini 3 series)

Prerequisites

Before configuring Vertex AI in Forge:

Google Cloud Account: Active GCP account with billing enabled
GCP Project: Project with Vertex AI API enabled
Authentication: Google Cloud CLI installed and configured

Setup Steps

Install Google Cloud CLI

If not already installed:macOS:

brew install google-cloud-sdk

Linux:

curl https://sdk.cloud.google.com | bash
exec -l $SHELL

Windows: Download from Google Cloud SDK

Authenticate and Configure GCP

Set up your GCP credentials:

# Authenticate with Google
gcloud auth login

# Set your project ID
gcloud config set project YOUR_PROJECT_ID

# Enable Vertex AI API (if not already enabled)
gcloud services enable aiplatform.googleapis.com

Configure Application Default Credentials

For Forge to access Vertex AI, set up ADC:

gcloud auth application-default login

This creates credentials that Forge can use automatically.

Configure Forge

Run the interactive login command:

forge provider login

Select Vertex AI and provide:

Project ID: Your GCP project ID
Location: GCP region (e.g., us-central1 or global)
Auth Method: Choose “Google ADC” (recommended)

Select a Model

Set your default model in forge.yaml:

model: gemini-2.5-pro

Verify Connection

Start Forge and test:

forge

Try a prompt:

> What are the key features of Vertex AI?

Configuration

Required Parameters

PROJECT_ID: Your Google Cloud project ID
LOCATION: GCP region (e.g., us-central1, europe-west1, or global)

API Endpoints

The endpoint format varies by location: Global location:

https://aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/global/publishers/google

Regional location:

https://{LOCATION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google

Authentication Methods

Google Application Default Credentials (Recommended)

Forge automatically uses ADC when configured with “Google ADC” method:

Tokens are refreshed automatically
No manual token management needed
Works seamlessly with GCP services

Manual API Token

You can also provide a token manually:

# Get access token
gcloud auth print-access-token

# Use with forge provider login
forge provider login
# Select Vertex AI
# Choose "API Key" method
# Paste the token

Manual tokens expire after 1 hour. Use Google ADC for automatic token refresh.

Model Selection

For Maximum Context

All Gemini models support 1M context:

gemini-3.1-pro-preview - Best overall
gemini-2.5-pro - Excellent capability
gemini-1.5-pro - Reliable choice

For Speed

gemini-3-flash-preview - Fast with thinking
gemini-2.5-flash - Fast and capable
gemini-1.5-flash - Quick responses
gemini-1.5-flash-8b - Ultra-fast

For Reasoning

Gemini 3 models support extended thinking:

gemini-3.1-pro-preview - Advanced reasoning
gemini-3-pro-preview - Strong reasoning
gemini-3-flash-preview - Fast reasoning

Switching Models

Change models during a session:

/model gemini-3.1-pro-preview

Regions and Availability

Recommended Regions

us-central1 - US Central (Iowa)
us-east4 - US East (Virginia)
europe-west1 - Europe (Belgium)
asia-northeast1 - Asia (Tokyo)
global - Global endpoint (auto-routing)

Choosing a Region

Use global if:

You want automatic routing
Latency is not critical
You don’t need regional data residency

Use specific region if:

You need low latency
Compliance requires data residency
You’re using other regional GCP services

Features

Massive Context Windows

Gemini models support 1M tokens:

Process entire codebases
Analyze large documents
Long conversation history
Complex multi-file operations

Multimodal Capabilities

Image understanding
Diagram analysis
Screenshot interpretation
Combined text and visual reasoning

Thinking Mode

Gemini 3 models show reasoning:

Explicit thought process
Step-by-step logic
Problem decomposition
Self-verification

Enterprise Features

Audit Logging: Full request/response logging
VPC Service Controls: Network security
Customer-Managed Keys: Data encryption
SLA: 99.9% uptime guarantee

Best Practices

Authentication

Never commit GCP credentials to version control. Always use ADC or service accounts.

For Development:

Use gcloud auth application-default login
Let Forge automatically refresh tokens

For Production:

Use service accounts with minimal permissions
Rotate credentials regularly
Enable audit logging

Cost Management

Model Selection:

Use Flash models for simple tasks (lower cost)
Use Pro models for complex reasoning (higher cost)
Monitor usage in GCP Console

Token Optimization:

Use smaller context when possible
Cache common prompts
Batch similar requests

Rate Limits

Vertex AI enforces quotas:

Requests per minute: Varies by model and region
Tokens per minute: Varies by model

Check quotas in GCP Console.

Troubleshooting

Authentication Errors

If authentication fails:

# Re-authenticate
gcloud auth application-default login

# Verify credentials
gcloud auth application-default print-access-token

# Check project
gcloud config get-value project

API Not Enabled

If you see “API not enabled”:

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com

# Verify it's enabled
gcloud services list --enabled | grep aiplatform

Permission Denied

If you lack permissions:

Check IAM roles in GCP Console
Ensure you have Vertex AI User role
Contact your GCP admin for access

Region Not Available

If a model isn’t available in your region:

Try the global location
Check model availability
Switch to an available region

Token Expiration

If using manual tokens and they expire:

# Get new token
gcloud auth print-access-token

# Or switch to ADC
forge provider login
# Select Vertex AI
# Choose "Google ADC"

Deprecated: Environment Variable Setup

Using environment variables is deprecated. Please use forge provider login instead.

For backward compatibility:

# .env
PROJECT_ID=your-project-id
LOCATION=us-central1
VERTEX_AI_AUTH_TOKEN=your-token

# forge.yaml  
model: gemini-2.5-pro

Next Steps

Learn about Gemini capabilities
Explore prompt design
Set up billing alerts to monitor costs
Configure VPC Service Controls for security

Getting Started

Core Concepts

Configuration

Providers

Features

Advanced Usage

Guides

​Available Models

​Gemini 3 Series (Latest)

​Gemini 2 Series

​Gemini 1.5 Series

​Prerequisites

​Setup Steps

​Configuration

​Required Parameters

​API Endpoints

​Authentication Methods

​Google Application Default Credentials (Recommended)

​Manual API Token

​Model Selection

​For Maximum Context

​For Speed

​For Reasoning

​Switching Models

​Regions and Availability

​Recommended Regions

​Choosing a Region

​Features

​Massive Context Windows

​Multimodal Capabilities

​Thinking Mode

​Enterprise Features

​Best Practices

​Authentication

​Cost Management

​Rate Limits

​Troubleshooting

​Authentication Errors

​API Not Enabled

​Permission Denied

​Region Not Available

​Token Expiration

​Deprecated: Environment Variable Setup

​Next Steps

Build docs developers (and LLMs) love

Available Models

Gemini 3 Series (Latest)

Gemini 2 Series

Gemini 1.5 Series

Prerequisites

Setup Steps

Configuration

Required Parameters

API Endpoints

Authentication Methods

Google Application Default Credentials (Recommended)

Manual API Token

Model Selection

For Maximum Context

For Speed

For Reasoning

Switching Models

Regions and Availability

Recommended Regions

Choosing a Region

Features

Massive Context Windows

Multimodal Capabilities

Thinking Mode

Enterprise Features

Best Practices

Authentication

Cost Management

Rate Limits

Troubleshooting

Authentication Errors

API Not Enabled

Permission Denied

Region Not Available

Token Expiration

Deprecated: Environment Variable Setup

Next Steps