Skip to main content
Google Cloud Vertex AI offers free preview access to select Meta Llama models with generous rate limits.

Overview

Google Cloud Vertex AI provides access to enterprise-grade AI models through Google Cloud Platform. Select Llama models are available for free during their preview period.
Payment Verification Required: Google Cloud requires very stringent payment verification to create an account, even for free tier access.

Rate Limits

All listed models are free during preview period.
Model NameRequests/MinutePricing
Llama 3.2 90B Vision Instruct30Free during preview
Llama 3.1 70B Instruct60Free during preview
Llama 3.1 8B Instruct60Free during preview

Available Models

Llama 3.2 90B Vision

Multimodal model with vision capabilities

Llama 3.1 70B

Powerful 70B parameter model

Llama 3.1 8B

Efficient 8B parameter model

API Usage

from google.cloud import aiplatform
from vertexai.preview.language_models import ChatModel

aiplatform.init(project="YOUR_PROJECT_ID", location="us-central1")

chat_model = ChatModel.from_pretrained("llama-3-1-70b-instruct-maas")

response = chat_model.send_message(
    "Hello, how are you?",
    temperature=0.7,
    max_output_tokens=1024
)

print(response.text)

Getting Started

1

Create Google Cloud Account

Sign up at cloud.google.com
2

Complete Payment Verification

Verify payment method (required even for free tier)
3

Create Project

Create a new project in Google Cloud Console
4

Enable Vertex AI API

Enable the Vertex AI API for your project
5

Access Model Garden

Visit the Model Garden
6

Deploy Model

Deploy a free preview model

Key Features

Enterprise Grade

Production-ready infrastructure

Free Preview

No cost during preview period

High Rate Limits

Up to 60 requests per minute

Vision Support

Multimodal capabilities with Llama 3.2

Global Infrastructure

Google’s worldwide network

Security

Enterprise security and compliance

Model Details

Llama 3.2 90B Vision Instruct

  • Capabilities: Text and image understanding
  • Rate Limit: 30 requests/minute
  • Best For: Multimodal applications, image analysis

Llama 3.1 70B Instruct

  • Capabilities: Advanced text generation and reasoning
  • Rate Limit: 60 requests/minute
  • Best For: Complex tasks, long-form content

Llama 3.1 8B Instruct

  • Capabilities: Efficient text generation
  • Rate Limit: 60 requests/minute
  • Best For: Fast inference, cost-effective applications

Important Considerations

  • Payment verification is mandatory, even for free tier
  • Models are free only during preview period
  • Pricing may change when models reach general availability
  • Rate limits are subject to change

Use Cases

  • Enterprise Applications: Build production-grade AI apps
  • Multimodal Projects: Leverage vision capabilities
  • Prototyping: Test Llama models at scale
  • Migration: Evaluate before committing to paid tier

Additional Resources

Vertex AI Console

Access the platform

Model Garden

Browse available models

Documentation

Official documentation

Llama Models

Meta Llama on Vertex AI

Build docs developers (and LLMs) love