Skip to main content
The google-ai provider supports Google’s generateContent and streamGenerateContent endpoints for Gemini models.
BAML uses the v1beta endpoint to access both existing v1 models and additional v1beta-exclusive models, following Google’s SDK conventions.

Quick Start

client<llm> MyClient {
  provider "google-ai"
  options {
    model "gemini-2.5-flash"
  }
}

Authentication

Set your Google API key as an environment variable:
export GOOGLE_API_KEY="your-api-key-here"
Or specify it explicitly in your BAML configuration:
client<llm> MyClient {
  provider "google-ai"
  options {
    api_key env.MY_GOOGLE_KEY
    model "gemini-2.5-flash"
  }
}

Configuration Options

BAML-Specific Options

These options modify the API request sent to Google AI.
api_key
string
default:"env.GOOGLE_API_KEY"
Passed as the x-goog-api-key header.
base_url
string
The base URL for the Google AI API.
headers
object
Additional headers to send with requests.
client<llm> MyClient {
  provider "google-ai"
  options {
    model "gemini-2.5-flash"
    headers {
      "X-My-Header" "my-value"
    }
  }
}

Supported Models

model
string
default:"gemini-2.5-flash"
The Gemini model to use.
ModelUse CaseContextKey Features
gemini-2.5-proComplex tasks, coding, STEM1MAdaptive thinking, multimodal
gemini-2.5-flashProduction apps, balanced performance1MBest price/performance
gemini-2.5-flash-liteHigh-volume, cost-sensitive1MLowest cost, fastest
You can specify any model name - BAML won’t validate whether it exists.See the Google Model Documentation for the latest models.

Model Parameters

Parameters like temperature are specified in the generationConfig object. See the Google AI documentation for details.
client<llm> MyClient {
  provider "google-ai"
  options {
    model "gemini-2.5-flash"
    generationConfig {
      temperature 0.5
      maxOutputTokens 1024
      topP 0.8
      topK 40
    }
  }
}
Common generationConfig parameters:
  • temperature - Controls randomness (0-2)
  • maxOutputTokens - Maximum tokens to generate
  • topP - Nucleus sampling parameter
  • topK - Top-k sampling parameter
  • stopSequences - Array of sequences that stop generation
For all options, see the Google Gemini API documentation.

Media Handling

Google AI uses send_base64_unless_google_url by default for images, which preserves Google Cloud Storage URLs (gs://) while converting other URLs to base64.

Features

  • Streaming: Automatically uses streamGenerateContent when you call the streaming interface
  • Multimodal: Supports text, image, audio, and video inputs
  • Large Context: Up to 1M tokens context window
  • Function Calling: Native support for tool use

Do Not Set

contents
DO NOT USE
BAML automatically constructs this from your prompt.

Build docs developers (and LLMs) love