Overview
The Google provider (GoogleLlm) provides access to Gemini models through either Google AI API (using API keys) or Vertex AI (for production deployments). Supports massive context windows up to 2 million tokens with intelligent context caching.
Source: packages/adk/src/models/google-llm.ts:113
Supported Models
The Google provider matches these model patterns:
// From google-llm.ts:129-136
static override supportedModels (): string [] {
return [
"gemini-.*" ,
"google/.*" ,
"projects/.+/locations/.+/endpoints/.+" ,
"projects/.+/locations/.+/publishers/google/models/gemini.+" ,
];
}
Model Examples
Gemini 2.5 Flash
Gemini 2.0 Flash
Gemini Pro
import { AgentBuilder } from '@iqai/adk' ;
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. build ();
Best for: Fast, multimodal reasoning with long context
Context: 1M tokens (experimental: 2M)
Output: 8K tokens
Multimodal: Yes (text, image, video, audio)
const agent = AgentBuilder
. withModel ( 'gemini-2.0-flash' )
. build ();
Best for: Latest generation, excellent reasoning
Context: 1M tokens
Multimodal: Yes
const agent = AgentBuilder
. withModel ( 'gemini-pro' )
. build ();
Best for: General purpose tasks
Context: 32K tokens
Stable, production-ready
Configuration
Google AI API (Development)
For development and testing, use the Google AI API with an API key:
Get your API key from Google AI Studio .
import { AgentBuilder } from '@iqai/adk' ;
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. build ();
Vertex AI (Production)
For production deployments, use Vertex AI:
GOOGLE_GENAI_USE_VERTEXAI = true
GOOGLE_CLOUD_PROJECT = your-project-id
GOOGLE_CLOUD_LOCATION = us-central1
Authentication : Vertex AI uses Application Default Credentials (ADC):
# Authenticate locally
gcloud auth application-default login
# Or set service account key
export GOOGLE_APPLICATION_CREDENTIALS = / path / to / key . json
The provider automatically detects the backend:
// From google-llm.ts:284-309
get apiClient (): GoogleGenAI {
if ( ! this . _apiClient ) {
const useVertexAI = process . env . GOOGLE_GENAI_USE_VERTEXAI === "true" ;
const apiKey = process . env . GOOGLE_API_KEY ;
const project = process . env . GOOGLE_CLOUD_PROJECT ;
const location = process . env . GOOGLE_CLOUD_LOCATION ;
if ( useVertexAI && project && location ) {
this . _apiClient = new GoogleGenAI ({
vertexai: true ,
project ,
location ,
});
} else if ( apiKey ) {
this . _apiClient = new GoogleGenAI ({
apiKey ,
});
} else {
throw new Error (
"Google API Key or Vertex AI configuration is required. " +
"Set GOOGLE_API_KEY or GOOGLE_GENAI_USE_VERTEXAI=true with GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION." ,
);
}
}
return this . _apiClient ;
}
Configuration Options
model
string
default: "gemini-2.5-flash"
The Gemini model to use
Maximum tokens to generate
Controls randomness (0.0 - 2.0)
Nucleus sampling parameter (0.0 - 1.0)
Top-K sampling parameter (Google-specific)
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. withConfig ({
maxOutputTokens: 8192 ,
temperature: 0.7 ,
topP: 0.9 ,
topK: 40
})
. build ();
Context Caching
Gemini’s context caching reduces costs by up to 75% for repeated context by caching system instructions, tools, and conversation history.
Enable Caching
import { AgentBuilder } from '@iqai/adk' ;
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. withInstruction ( 'You are a helpful coding assistant...' )
. withCacheConfig ({
ttlSeconds: 3600 // 1 hour
})
. build ();
Cache Manager
The provider uses GeminiContextCacheManager to handle caching:
// From google-llm.ts:152-164
if ( llmRequest . cacheConfig ) {
this . logger . debug ( "Handling context caching" );
cacheManager = new GeminiContextCacheManager ( this . logger , this . apiClient );
cacheMetadata = await cacheManager . handleContextCaching ( llmRequest );
if ( cacheMetadata ) {
if ( cacheMetadata . cacheName ) {
this . logger . debug ( `Using cache: ${ cacheMetadata . cacheName } ` );
} else {
this . logger . debug ( "Cache fingerprint only, no active cache" );
}
}
}
What Gets Cached?
With cacheConfig enabled, these are cached:
System instructions
Tools (function declarations)
Initial conversation turns
The cache is keyed by a fingerprint of these elements.
Cache TTL
Set cache duration in seconds:
Short (5 minutes)
Medium (1 hour)
Long (24 hours)
. withCacheConfig ({ ttlSeconds: 300 })
Use for: Quick interactions, testing. withCacheConfig ({ ttlSeconds: 3600 })
Use for: Extended sessions, common patterns. withCacheConfig ({ ttlSeconds: 86400 })
Use for: Stable prompts, production agents
Streaming
Basic Streaming
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. build ();
for await ( const chunk of agent . run ( 'Write a story' , { stream: true })) {
process . stdout . write ( chunk . text || '' );
}
Streaming Architecture
Google uses a response aggregator for intelligent chunk handling:
// From google-llm.ts:175-201
if ( stream ) {
const responses = await this . apiClient . models . generateContentStream ({
model ,
contents ,
config ,
});
const aggregator = new StreamingResponseAggregator ();
for await ( const resp of responses ) {
for await ( const llmResponse of aggregator . processResponse ( resp )) {
yield llmResponse ;
}
}
// Get final aggregated response
const closeResult = aggregator . close ();
if ( closeResult ) {
// Populate cache metadata in the final aggregated response for streaming
if ( cacheMetadata && cacheManager ) {
cacheManager . populateCacheMetadataInResponse (
closeResult ,
cacheMetadata ,
);
}
yield closeResult ;
}
}
Response Aggregator
The StreamingResponseAggregator (google-llm.ts:32-108) separates thought and regular text:
class StreamingResponseAggregator {
private thoughtText = "" ;
private text = "" ;
private lastUsageMetadata : any = null ;
async * processResponse (
response : GenerateContentResponse
) : AsyncGenerator < LlmResponse , void , unknown > {
const llmResponse = LlmResponse . create ( response );
this . lastUsageMetadata = llmResponse . usageMetadata ;
if ( llmResponse . content ?. parts ?.[ 0 ]?. text ) {
const part0 = llmResponse . content . parts [ 0 ];
if ( part0 . thought ) {
this . thoughtText += part0 . text ;
} else {
this . text += part0 . text ;
}
llmResponse . partial = true ;
}
// ... merge and yield logic
}
}
Function Calling
import { AgentBuilder , BaseTool } from '@iqai/adk' ;
import { z } from 'zod/v4' ;
class SearchTool extends BaseTool {
name = 'search' ;
description = 'Search the web' ;
inputSchema = z . object ({
query: z . string (). describe ( 'Search query' ),
numResults: z . number (). default ( 10 )
});
async execute ( input : { query : string ; numResults : number }) {
// Call search API
return {
results: [
{ title: 'Result 1' , url: 'https://...' },
{ title: 'Result 2' , url: 'https://...' }
]
};
}
}
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. withTools ( new SearchTool ())
. withCacheConfig ({ ttlSeconds: 3600 }) // Cache tools!
. build ();
const response = await agent . ask ( 'Search for TypeScript tutorials' );
Gemini uses Google’s native function calling format (compatible with ADK):
{
name : "search" ,
description : "Search the web" ,
parameters : {
type : "object" ,
properties : {
query : { type : "string" , description : "Search query" },
numResults : { type : "number" , default : 10 }
},
required : [ "query" ]
}
}
Multimodal Support
Gemini models are natively multimodal - they can process text, images, video, and audio:
import { AgentBuilder } from '@iqai/adk' ;
import fs from 'fs' ;
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. build ();
const imageBuffer = fs . readFileSync ( 'diagram.png' );
const base64Image = imageBuffer . toString ( 'base64' );
const response = await agent . ask ({
contents: [{
role: 'user' ,
parts: [
{ text: 'Explain this architecture diagram' },
{
inline_data: {
mime_type: 'image/png' ,
data: base64Image
}
}
]
}]
});
const videoBuffer = fs . readFileSync ( 'tutorial.mp4' );
const base64Video = videoBuffer . toString ( 'base64' );
const response = await agent . ask ({
contents: [{
role: 'user' ,
parts: [
{ text: 'Summarize this video tutorial' },
{
inline_data: {
mime_type: 'video/mp4' ,
data: base64Video
}
}
]
}]
});
const audioBuffer = fs . readFileSync ( 'recording.mp3' );
const base64Audio = audioBuffer . toString ( 'base64' );
const response = await agent . ask ({
contents: [{
role: 'user' ,
parts: [
{ text: 'Transcribe and analyze this audio' },
{
inline_data: {
mime_type: 'audio/mp3' ,
data: base64Audio
}
}
]
}]
});
Backend Differences
Google AI API vs Vertex AI
Feature Google AI API Vertex AI Auth API Key ADC / Service Account Best for Development, Testing Production SLA No SLA Enterprise SLA Pricing Pay-per-use Enterprise pricing Labels Not supported Supported Quotas Standard Configurable
API Preprocessing
The provider automatically adapts requests based on backend:
// From google-llm.ts:253-270
private preprocessRequest ( llmRequest : LlmRequest ): void {
if ( this . apiBackend === GoogleLLMVariant . GEMINI_API ) {
// Using API key from Google AI Studio doesn't support labels
if ( llmRequest . config ) {
llmRequest . config . labels = undefined ;
}
if ( llmRequest . contents ) {
for ( const content of llmRequest . contents ) {
if ( ! content . parts ) continue ;
for ( const part of content . parts ) {
this . removeDisplayNameIfPresent ( part . inlineData );
this . removeDisplayNameIfPresent ( part . fileData );
}
}
}
}
}
Error Handling
Rate Limit Errors
import { RateLimitError } from '@iqai/adk' ;
try {
const response = await agent . ask ( 'Hello' );
} catch ( error ) {
if ( error instanceof RateLimitError ) {
console . log ( 'Rate limited!' );
console . log ( 'Provider:' , error . provider ); // 'google'
console . log ( 'Model:' , error . model );
console . log ( 'Retry after:' , error . retryAfter );
}
}
Best Practices
Use Gemini 2.5 Flash for most applications (best balance)
Use Gemini 2.0 Flash for latest features
Use Gemini Pro for stable production workloads
Leverage 1M-2M context for large documents, codebases
Enable caching for any repeated context
Cache system instructions, tools, document context
Use longer TTL (24 hours) for stable prompts
Monitor cache hits via logs
Use Gemini for vision tasks (image analysis)
Process video for content understanding
Analyze audio for transcription + sentiment
Combine modalities (image + text, video + audio)
Use Vertex AI for production (not Google AI API)
Set up proper service account with minimal permissions
Configure quotas and rate limits
Enable logging and monitoring
Advanced Features
Large Context Processing
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. withCacheConfig ({ ttlSeconds: 86400 })
. build ();
// Process entire codebase
const codebase = readLargeCodebase (); // 500K tokens
const response = await agent . ask ({
contents: [{
role: 'user' ,
parts: [{ text: `Analyze this codebase: \n\n ${ codebase } \n\n What are the main architectural patterns?` }]
}]
});
Grounding with Google Search
// Vertex AI only
const agent = AgentBuilder
. withModel ( 'gemini-2.5-flash' )
. withConfig ({
tools: [{
googleSearchRetrieval: {
dynamicRetrievalConfig: {
mode: 'MODE_DYNAMIC' ,
dynamicThreshold: 0.7
}
}
}]
})
. build ();
const response = await agent . ask ( 'What are the latest TypeScript features?' );
// Gemini will search Google and ground its response
Limitations
No Live Connections : Gemini models do not support live/bidirectional connections. The connect() method will throw an error.
Backend Differences : Some features (labels, display names) only work on Vertex AI, not Google AI API. The provider automatically handles these differences.
Next Steps
OpenAI Provider Compare with GPT models
Anthropic Provider Explore Claude models
Multimodal Tools Build vision and audio tools
Vertex AI Setup Deploy to production on Vertex AI