Prerequisites
Before using Vertex AI with Crush, you need:- A Google Cloud account with Vertex AI enabled
- The
gcloudCLI installed and configured - A Google Cloud project with billing enabled
- Vertex AI API enabled in your project
Required Environment Variables
Vertex AI requires two environment variables to be set:VERTEXAI_PROJECT
Your Google Cloud project ID:VERTEXAI_LOCATION
The Google Cloud region where you want to run models:us-central1(United States - Iowa)us-east4(United States - N. Virginia)europe-west1(Belgium)europe-west4(Netherlands)asia-southeast1(Singapore)
Model availability varies by region. Check the Vertex AI documentation for model availability in your preferred location.
Setting Both Variables
Authentication
Crush uses Google Cloud’s Application Default Credentials (ADC) for authentication. You need to authenticate using thegcloud CLI:
- Open your web browser
- Ask you to sign in to your Google account
- Request permission to access Google Cloud resources
- Store credentials locally for use by Crush
Enabling Vertex AI in Crush
Once you have set the required environment variables and authenticated, Crush will automatically detect your Vertex AI configuration and show it as an available provider. To verify:- Set
VERTEXAI_PROJECTandVERTEXAI_LOCATION - Run
gcloud auth application-default login - Start Crush:
crush - Check the list of available models for Vertex AI
Model Configuration
While Crush will automatically detect Vertex AI, you can customize the available models in yourcrush.json configuration:
Model Configuration Fields
id: The Vertex AI model identifier (e.g.,claude-sonnet-4@20250514)name: Display name for the model in Crushcost_per_1m_in: Cost per 1 million input tokens (USD)cost_per_1m_out: Cost per 1 million output tokens (USD)cost_per_1m_in_cached: Cost per 1 million cached input tokens (USD)cost_per_1m_out_cached: Cost per 1 million cached output tokens (USD)context_window: Maximum number of tokens (input + output)default_max_tokens: Default maximum tokens for responsescan_reason: Whether the model supports extended thinkingsupports_attachments: Whether the model can process file attachments
Available Models
Through Vertex AI, you can access various models including:- Anthropic Claude models (Claude 3.5, Claude 3 family)
- Google’s Gemini models
- Other supported foundation models
Pricing and Billing
Vertex AI pricing differs from direct provider APIs:Pricing Structure
- Per-token pricing: Charged based on input and output tokens
- Caching discount: Lower rates for cached prompt tokens (when supported)
- Region-specific: Prices vary by Google Cloud region
- No minimum charges: Pay only for what you use
Cost Tracking
Crush tracks your token usage and estimated costs. To view your usage:Pricing Resources
Troubleshooting
Vertex AI Not Appearing
If Vertex AI doesn’t show up as a provider:- Verify environment variables are set:
- Check authentication status:
- Ensure Vertex AI API is enabled in your project
Authentication Errors
If you see authentication errors:- Re-authenticate:
- Verify your account has necessary permissions
- Check that your credentials haven’t expired
API Not Enabled
If you see “API not enabled” errors:- Enable the Vertex AI API:
- Wait a few minutes for the API to be fully enabled
- Restart Crush
Permission Denied
If you see permission errors:- Verify you have the required IAM roles:
Vertex AI UserorVertex AI Administrator
- Check project-level permissions in Google Cloud Console
- Ensure billing is enabled on your project
Required IAM Roles
Your Google Cloud account needs the following IAM roles:roles/aiplatform.user: To use Vertex AI services
roles/aiplatform.admin: For full Vertex AI access
Replace
PROJECT_ID with your actual project ID and [email protected] with your Google account email.Best Practices
Region Selection
Choose a region based on:- Latency: Select a region close to your location
- Model availability: Not all models are available in all regions
- Pricing: Prices may vary slightly by region
- Data residency: Choose regions that comply with your data regulations
Cost Optimization
- Monitor usage: Use
crush statsto track token consumption - Set budget alerts: Configure alerts in Google Cloud Console
- Use appropriate models: Smaller models cost less but may be sufficient for some tasks
- Leverage caching: Use models that support prompt caching to reduce costs
Next Steps
Amazon Bedrock
Learn about using AWS Bedrock with Crush
Custom Providers
Configure custom API providers