Google Provider

Overview

The Google provider (GoogleLlm) provides access to Gemini models through either Google AI API (using API keys) or Vertex AI (for production deployments). Supports massive context windows up to 2 million tokens with intelligent context caching. Source: packages/adk/src/models/google-llm.ts:113

Supported Models

The Google provider matches these model patterns:

// From google-llm.ts:129-136
static override supportedModels(): string[] {
  return [
    "gemini-.*",
    "google/.*",
    "projects/.+/locations/.+/endpoints/.+",
    "projects/.+/locations/.+/publishers/google/models/gemini.+",
  ];
}

Model Examples

Gemini 2.5 Flash
Gemini 2.0 Flash
Gemini Pro

import { AgentBuilder } from '@iqai/adk';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

Best for: Fast, multimodal reasoning with long context

Context: 1M tokens (experimental: 2M)
Output: 8K tokens
Multimodal: Yes (text, image, video, audio)

const agent = AgentBuilder
  .withModel('gemini-2.0-flash')
  .build();

Best for: Latest generation, excellent reasoning

Context: 1M tokens
Multimodal: Yes

const agent = AgentBuilder
  .withModel('gemini-pro')
  .build();

Best for: General purpose tasks

Context: 32K tokens
Stable, production-ready

Configuration

Google AI API (Development)

For development and testing, use the Google AI API with an API key:

.env

GOOGLE_API_KEY=AIzaSy...

Get your API key from Google AI Studio.

import { AgentBuilder } from '@iqai/adk';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

Vertex AI (Production)

For production deployments, use Vertex AI:

.env

GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1

Authentication: Vertex AI uses Application Default Credentials (ADC):

# Authenticate locally
gcloud auth application-default login

# Or set service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

The provider automatically detects the backend:

// From google-llm.ts:284-309
get apiClient(): GoogleGenAI {
  if (!this._apiClient) {
    const useVertexAI = process.env.GOOGLE_GENAI_USE_VERTEXAI === "true";
    const apiKey = process.env.GOOGLE_API_KEY;
    const project = process.env.GOOGLE_CLOUD_PROJECT;
    const location = process.env.GOOGLE_CLOUD_LOCATION;

    if (useVertexAI && project && location) {
      this._apiClient = new GoogleGenAI({
        vertexai: true,
        project,
        location,
      });
    } else if (apiKey) {
      this._apiClient = new GoogleGenAI({
        apiKey,
      });
    } else {
      throw new Error(
        "Google API Key or Vertex AI configuration is required. " +
          "Set GOOGLE_API_KEY or GOOGLE_GENAI_USE_VERTEXAI=true with GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION.",
      );
    }
  }
  return this._apiClient;
}

Configuration Options

model

string

default:"gemini-2.5-flash"

The Gemini model to use

maxOutputTokens

number

Maximum tokens to generate

temperature

number

default:"1.0"

Controls randomness (0.0 - 2.0)

topP

number

default:"0.95"

Nucleus sampling parameter (0.0 - 1.0)

topK

number

Top-K sampling parameter (Google-specific)

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withConfig({
    maxOutputTokens: 8192,
    temperature: 0.7,
    topP: 0.9,
    topK: 40
  })
  .build();

Context Caching

Gemini’s context caching reduces costs by up to 75% for repeated context by caching system instructions, tools, and conversation history.

Enable Caching

import { AgentBuilder } from '@iqai/adk';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withInstruction('You are a helpful coding assistant...')
  .withCacheConfig({
    ttlSeconds: 3600 // 1 hour
  })
  .build();

Cache Manager

The provider uses GeminiContextCacheManager to handle caching:

// From google-llm.ts:152-164
if (llmRequest.cacheConfig) {
  this.logger.debug("Handling context caching");
  cacheManager = new GeminiContextCacheManager(this.logger, this.apiClient);
  cacheMetadata = await cacheManager.handleContextCaching(llmRequest);

  if (cacheMetadata) {
    if (cacheMetadata.cacheName) {
      this.logger.debug(`Using cache: ${cacheMetadata.cacheName}`);
    } else {
      this.logger.debug("Cache fingerprint only, no active cache");
    }
  }
}

What Gets Cached?

With cacheConfig enabled, these are cached:

System instructions
Tools (function declarations)
Initial conversation turns

The cache is keyed by a fingerprint of these elements.

Cache TTL

Set cache duration in seconds:

Short (5 minutes)
Medium (1 hour)
Long (24 hours)

.withCacheConfig({ ttlSeconds: 300 })

Use for: Quick interactions, testing

.withCacheConfig({ ttlSeconds: 3600 })

Use for: Extended sessions, common patterns

.withCacheConfig({ ttlSeconds: 86400 })

Use for: Stable prompts, production agents

Streaming

Basic Streaming

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

for await (const chunk of agent.run('Write a story', { stream: true })) {
  process.stdout.write(chunk.text || '');
}

Streaming Architecture

Google uses a response aggregator for intelligent chunk handling:

// From google-llm.ts:175-201
if (stream) {
  const responses = await this.apiClient.models.generateContentStream({
    model,
    contents,
    config,
  });

  const aggregator = new StreamingResponseAggregator();

  for await (const resp of responses) {
    for await (const llmResponse of aggregator.processResponse(resp)) {
      yield llmResponse;
    }
  }

  // Get final aggregated response
  const closeResult = aggregator.close();
  if (closeResult) {
    // Populate cache metadata in the final aggregated response for streaming
    if (cacheMetadata && cacheManager) {
      cacheManager.populateCacheMetadataInResponse(
        closeResult,
        cacheMetadata,
      );
    }
    yield closeResult;
  }
}

Response Aggregator

The StreamingResponseAggregator (google-llm.ts:32-108) separates thought and regular text:

class StreamingResponseAggregator {
  private thoughtText = "";
  private text = "";
  private lastUsageMetadata: any = null;

  async *processResponse(
    response: GenerateContentResponse
  ): AsyncGenerator<LlmResponse, void, unknown> {
    const llmResponse = LlmResponse.create(response);
    this.lastUsageMetadata = llmResponse.usageMetadata;

    if (llmResponse.content?.parts?.[0]?.text) {
      const part0 = llmResponse.content.parts[0];
      if (part0.thought) {
        this.thoughtText += part0.text;
      } else {
        this.text += part0.text;
      }
      llmResponse.partial = true;
    }
    // ... merge and yield logic
  }
}

Function Calling

With ADK Tools

import { AgentBuilder, BaseTool } from '@iqai/adk';
import { z } from 'zod/v4';

class SearchTool extends BaseTool {
  name = 'search';
  description = 'Search the web';
  inputSchema = z.object({
    query: z.string().describe('Search query'),
    numResults: z.number().default(10)
  });

  async execute(input: { query: string; numResults: number }) {
    // Call search API
    return {
      results: [
        { title: 'Result 1', url: 'https://...' },
        { title: 'Result 2', url: 'https://...' }
      ]
    };
  }
}

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withTools(new SearchTool())
  .withCacheConfig({ ttlSeconds: 3600 }) // Cache tools!
  .build();

const response = await agent.ask('Search for TypeScript tutorials');

Tool Format

Gemini uses Google’s native function calling format (compatible with ADK):

{
  name: "search",
  description: "Search the web",
  parameters: {
    type: "object",
    properties: {
      query: { type: "string", description: "Search query" },
      numResults: { type: "number", default: 10 }
    },
    required: ["query"]
  }
}

Multimodal Support

Gemini models are natively multimodal - they can process text, images, video, and audio:

Image Input

import { AgentBuilder } from '@iqai/adk';
import fs from 'fs';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

const imageBuffer = fs.readFileSync('diagram.png');
const base64Image = imageBuffer.toString('base64');

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [
      { text: 'Explain this architecture diagram' },
      {
        inline_data: {
          mime_type: 'image/png',
          data: base64Image
        }
      }
    ]
  }]
});

Video Input

const videoBuffer = fs.readFileSync('tutorial.mp4');
const base64Video = videoBuffer.toString('base64');

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [
      { text: 'Summarize this video tutorial' },
      {
        inline_data: {
          mime_type: 'video/mp4',
          data: base64Video
        }
      }
    ]
  }]
});

Audio Input

const audioBuffer = fs.readFileSync('recording.mp3');
const base64Audio = audioBuffer.toString('base64');

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [
      { text: 'Transcribe and analyze this audio' },
      {
        inline_data: {
          mime_type: 'audio/mp3',
          data: base64Audio
        }
      }
    ]
  }]
});

Backend Differences

Google AI API vs Vertex AI

Feature	Google AI API	Vertex AI
Auth	API Key	ADC / Service Account
Best for	Development, Testing	Production
SLA	No SLA	Enterprise SLA
Pricing	Pay-per-use	Enterprise pricing
Labels	Not supported	Supported
Quotas	Standard	Configurable

API Preprocessing

The provider automatically adapts requests based on backend:

// From google-llm.ts:253-270
private preprocessRequest(llmRequest: LlmRequest): void {
  if (this.apiBackend === GoogleLLMVariant.GEMINI_API) {
    // Using API key from Google AI Studio doesn't support labels
    if (llmRequest.config) {
      llmRequest.config.labels = undefined;
    }

    if (llmRequest.contents) {
      for (const content of llmRequest.contents) {
        if (!content.parts) continue;
        for (const part of content.parts) {
          this.removeDisplayNameIfPresent(part.inlineData);
          this.removeDisplayNameIfPresent(part.fileData);
        }
      }
    }
  }
}

Error Handling

Rate Limit Errors

import { RateLimitError } from '@iqai/adk';

try {
  const response = await agent.ask('Hello');
} catch (error) {
  if (error instanceof RateLimitError) {
    console.log('Rate limited!');
    console.log('Provider:', error.provider); // 'google'
    console.log('Model:', error.model);
    console.log('Retry after:', error.retryAfter);
  }
}

Best Practices

Model Selection

Use Gemini 2.5 Flash for most applications (best balance)
Use Gemini 2.0 Flash for latest features
Use Gemini Pro for stable production workloads
Leverage 1M-2M context for large documents, codebases

Context Caching

Enable caching for any repeated context
Cache system instructions, tools, document context
Use longer TTL (24 hours) for stable prompts
Monitor cache hits via logs

Multimodal

Use Gemini for vision tasks (image analysis)
Process video for content understanding
Analyze audio for transcription + sentiment
Combine modalities (image + text, video + audio)

Production Deployment

Use Vertex AI for production (not Google AI API)
Set up proper service account with minimal permissions
Configure quotas and rate limits
Enable logging and monitoring

Advanced Features

Large Context Processing

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withCacheConfig({ ttlSeconds: 86400 })
  .build();

// Process entire codebase
const codebase = readLargeCodebase(); // 500K tokens

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [{ text: `Analyze this codebase:\n\n${codebase}\n\nWhat are the main architectural patterns?` }]
  }]
});

Grounding with Google Search

// Vertex AI only
const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withConfig({
    tools: [{
      googleSearchRetrieval: {
        dynamicRetrievalConfig: {
          mode: 'MODE_DYNAMIC',
          dynamicThreshold: 0.7
        }
      }
    }]
  })
  .build();

const response = await agent.ask('What are the latest TypeScript features?');
// Gemini will search Google and ground its response

Limitations

No Live Connections: Gemini models do not support live/bidirectional connections. The connect() method will throw an error.

Backend Differences: Some features (labels, display names) only work on Vertex AI, not Google AI API. The provider automatically handles these differences.

Next Steps

OpenAI Provider

Compare with GPT models

Anthropic Provider

Explore Claude models

Multimodal Tools

Build vision and audio tools

Vertex AI Setup

Deploy to production on Vertex AI

Getting Started

Core Concepts

Agents

Models & Providers

Tools

Memory & State

Advanced Features

CLI Tool

Examples

​Overview

​Supported Models

​Model Examples

​Configuration

​Google AI API (Development)

​Vertex AI (Production)

​Configuration Options

​Context Caching

​Enable Caching

​Cache Manager

​What Gets Cached?

​Cache TTL

​Streaming

​Basic Streaming

​Streaming Architecture

​Response Aggregator

​Function Calling

​With ADK Tools

​Tool Format

​Multimodal Support

​Image Input

​Video Input

​Audio Input

​Backend Differences

​Google AI API vs Vertex AI

​API Preprocessing

​Error Handling

​Rate Limit Errors

​Best Practices

​Advanced Features

​Large Context Processing

​Grounding with Google Search

​Limitations

​Next Steps

OpenAI Provider

Anthropic Provider

Multimodal Tools

Vertex AI Setup

Build docs developers (and LLMs) love

Overview

Supported Models

Model Examples

Configuration

Google AI API (Development)

Vertex AI (Production)

Configuration Options

Context Caching

Enable Caching

Cache Manager

What Gets Cached?

Cache TTL

Streaming

Basic Streaming

Streaming Architecture

Response Aggregator

Function Calling

With ADK Tools

Tool Format

Multimodal Support

Image Input

Video Input

Audio Input

Backend Differences

Google AI API vs Vertex AI

API Preprocessing

Error Handling

Rate Limit Errors

Best Practices

Advanced Features

Large Context Processing

Grounding with Google Search

Limitations

Next Steps