Skip to main content

Overview

The Google provider (GoogleLlm) provides access to Gemini models through either Google AI API (using API keys) or Vertex AI (for production deployments). Supports massive context windows up to 2 million tokens with intelligent context caching. Source: packages/adk/src/models/google-llm.ts:113

Supported Models

The Google provider matches these model patterns:
// From google-llm.ts:129-136
static override supportedModels(): string[] {
  return [
    "gemini-.*",
    "google/.*",
    "projects/.+/locations/.+/endpoints/.+",
    "projects/.+/locations/.+/publishers/google/models/gemini.+",
  ];
}

Model Examples

import { AgentBuilder } from '@iqai/adk';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();
Best for: Fast, multimodal reasoning with long context
  • Context: 1M tokens (experimental: 2M)
  • Output: 8K tokens
  • Multimodal: Yes (text, image, video, audio)

Configuration

Google AI API (Development)

For development and testing, use the Google AI API with an API key:
.env
GOOGLE_API_KEY=AIzaSy...
Get your API key from Google AI Studio.
import { AgentBuilder } from '@iqai/adk';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

Vertex AI (Production)

For production deployments, use Vertex AI:
.env
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
Authentication: Vertex AI uses Application Default Credentials (ADC):
# Authenticate locally
gcloud auth application-default login

# Or set service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
The provider automatically detects the backend:
// From google-llm.ts:284-309
get apiClient(): GoogleGenAI {
  if (!this._apiClient) {
    const useVertexAI = process.env.GOOGLE_GENAI_USE_VERTEXAI === "true";
    const apiKey = process.env.GOOGLE_API_KEY;
    const project = process.env.GOOGLE_CLOUD_PROJECT;
    const location = process.env.GOOGLE_CLOUD_LOCATION;

    if (useVertexAI && project && location) {
      this._apiClient = new GoogleGenAI({
        vertexai: true,
        project,
        location,
      });
    } else if (apiKey) {
      this._apiClient = new GoogleGenAI({
        apiKey,
      });
    } else {
      throw new Error(
        "Google API Key or Vertex AI configuration is required. " +
          "Set GOOGLE_API_KEY or GOOGLE_GENAI_USE_VERTEXAI=true with GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION.",
      );
    }
  }
  return this._apiClient;
}

Configuration Options

model
string
default:"gemini-2.5-flash"
The Gemini model to use
maxOutputTokens
number
Maximum tokens to generate
temperature
number
default:"1.0"
Controls randomness (0.0 - 2.0)
topP
number
default:"0.95"
Nucleus sampling parameter (0.0 - 1.0)
topK
number
Top-K sampling parameter (Google-specific)
const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withConfig({
    maxOutputTokens: 8192,
    temperature: 0.7,
    topP: 0.9,
    topK: 40
  })
  .build();

Context Caching

Gemini’s context caching reduces costs by up to 75% for repeated context by caching system instructions, tools, and conversation history.

Enable Caching

import { AgentBuilder } from '@iqai/adk';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withInstruction('You are a helpful coding assistant...')
  .withCacheConfig({
    ttlSeconds: 3600 // 1 hour
  })
  .build();

Cache Manager

The provider uses GeminiContextCacheManager to handle caching:
// From google-llm.ts:152-164
if (llmRequest.cacheConfig) {
  this.logger.debug("Handling context caching");
  cacheManager = new GeminiContextCacheManager(this.logger, this.apiClient);
  cacheMetadata = await cacheManager.handleContextCaching(llmRequest);

  if (cacheMetadata) {
    if (cacheMetadata.cacheName) {
      this.logger.debug(`Using cache: ${cacheMetadata.cacheName}`);
    } else {
      this.logger.debug("Cache fingerprint only, no active cache");
    }
  }
}

What Gets Cached?

With cacheConfig enabled, these are cached:
  1. System instructions
  2. Tools (function declarations)
  3. Initial conversation turns
The cache is keyed by a fingerprint of these elements.

Cache TTL

Set cache duration in seconds:
.withCacheConfig({ ttlSeconds: 300 })
Use for: Quick interactions, testing

Streaming

Basic Streaming

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

for await (const chunk of agent.run('Write a story', { stream: true })) {
  process.stdout.write(chunk.text || '');
}

Streaming Architecture

Google uses a response aggregator for intelligent chunk handling:
// From google-llm.ts:175-201
if (stream) {
  const responses = await this.apiClient.models.generateContentStream({
    model,
    contents,
    config,
  });

  const aggregator = new StreamingResponseAggregator();

  for await (const resp of responses) {
    for await (const llmResponse of aggregator.processResponse(resp)) {
      yield llmResponse;
    }
  }

  // Get final aggregated response
  const closeResult = aggregator.close();
  if (closeResult) {
    // Populate cache metadata in the final aggregated response for streaming
    if (cacheMetadata && cacheManager) {
      cacheManager.populateCacheMetadataInResponse(
        closeResult,
        cacheMetadata,
      );
    }
    yield closeResult;
  }
}

Response Aggregator

The StreamingResponseAggregator (google-llm.ts:32-108) separates thought and regular text:
class StreamingResponseAggregator {
  private thoughtText = "";
  private text = "";
  private lastUsageMetadata: any = null;

  async *processResponse(
    response: GenerateContentResponse
  ): AsyncGenerator<LlmResponse, void, unknown> {
    const llmResponse = LlmResponse.create(response);
    this.lastUsageMetadata = llmResponse.usageMetadata;

    if (llmResponse.content?.parts?.[0]?.text) {
      const part0 = llmResponse.content.parts[0];
      if (part0.thought) {
        this.thoughtText += part0.text;
      } else {
        this.text += part0.text;
      }
      llmResponse.partial = true;
    }
    // ... merge and yield logic
  }
}

Function Calling

With ADK Tools

import { AgentBuilder, BaseTool } from '@iqai/adk';
import { z } from 'zod/v4';

class SearchTool extends BaseTool {
  name = 'search';
  description = 'Search the web';
  inputSchema = z.object({
    query: z.string().describe('Search query'),
    numResults: z.number().default(10)
  });

  async execute(input: { query: string; numResults: number }) {
    // Call search API
    return {
      results: [
        { title: 'Result 1', url: 'https://...' },
        { title: 'Result 2', url: 'https://...' }
      ]
    };
  }
}

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withTools(new SearchTool())
  .withCacheConfig({ ttlSeconds: 3600 }) // Cache tools!
  .build();

const response = await agent.ask('Search for TypeScript tutorials');

Tool Format

Gemini uses Google’s native function calling format (compatible with ADK):
{
  name: "search",
  description: "Search the web",
  parameters: {
    type: "object",
    properties: {
      query: { type: "string", description: "Search query" },
      numResults: { type: "number", default: 10 }
    },
    required: ["query"]
  }
}

Multimodal Support

Gemini models are natively multimodal - they can process text, images, video, and audio:

Image Input

import { AgentBuilder } from '@iqai/adk';
import fs from 'fs';

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .build();

const imageBuffer = fs.readFileSync('diagram.png');
const base64Image = imageBuffer.toString('base64');

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [
      { text: 'Explain this architecture diagram' },
      {
        inline_data: {
          mime_type: 'image/png',
          data: base64Image
        }
      }
    ]
  }]
});

Video Input

const videoBuffer = fs.readFileSync('tutorial.mp4');
const base64Video = videoBuffer.toString('base64');

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [
      { text: 'Summarize this video tutorial' },
      {
        inline_data: {
          mime_type: 'video/mp4',
          data: base64Video
        }
      }
    ]
  }]
});

Audio Input

const audioBuffer = fs.readFileSync('recording.mp3');
const base64Audio = audioBuffer.toString('base64');

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [
      { text: 'Transcribe and analyze this audio' },
      {
        inline_data: {
          mime_type: 'audio/mp3',
          data: base64Audio
        }
      }
    ]
  }]
});

Backend Differences

Google AI API vs Vertex AI

FeatureGoogle AI APIVertex AI
AuthAPI KeyADC / Service Account
Best forDevelopment, TestingProduction
SLANo SLAEnterprise SLA
PricingPay-per-useEnterprise pricing
LabelsNot supportedSupported
QuotasStandardConfigurable

API Preprocessing

The provider automatically adapts requests based on backend:
// From google-llm.ts:253-270
private preprocessRequest(llmRequest: LlmRequest): void {
  if (this.apiBackend === GoogleLLMVariant.GEMINI_API) {
    // Using API key from Google AI Studio doesn't support labels
    if (llmRequest.config) {
      llmRequest.config.labels = undefined;
    }

    if (llmRequest.contents) {
      for (const content of llmRequest.contents) {
        if (!content.parts) continue;
        for (const part of content.parts) {
          this.removeDisplayNameIfPresent(part.inlineData);
          this.removeDisplayNameIfPresent(part.fileData);
        }
      }
    }
  }
}

Error Handling

Rate Limit Errors

import { RateLimitError } from '@iqai/adk';

try {
  const response = await agent.ask('Hello');
} catch (error) {
  if (error instanceof RateLimitError) {
    console.log('Rate limited!');
    console.log('Provider:', error.provider); // 'google'
    console.log('Model:', error.model);
    console.log('Retry after:', error.retryAfter);
  }
}

Best Practices

  • Use Gemini 2.5 Flash for most applications (best balance)
  • Use Gemini 2.0 Flash for latest features
  • Use Gemini Pro for stable production workloads
  • Leverage 1M-2M context for large documents, codebases
  • Enable caching for any repeated context
  • Cache system instructions, tools, document context
  • Use longer TTL (24 hours) for stable prompts
  • Monitor cache hits via logs
  • Use Gemini for vision tasks (image analysis)
  • Process video for content understanding
  • Analyze audio for transcription + sentiment
  • Combine modalities (image + text, video + audio)
  • Use Vertex AI for production (not Google AI API)
  • Set up proper service account with minimal permissions
  • Configure quotas and rate limits
  • Enable logging and monitoring

Advanced Features

Large Context Processing

const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withCacheConfig({ ttlSeconds: 86400 })
  .build();

// Process entire codebase
const codebase = readLargeCodebase(); // 500K tokens

const response = await agent.ask({
  contents: [{
    role: 'user',
    parts: [{ text: `Analyze this codebase:\n\n${codebase}\n\nWhat are the main architectural patterns?` }]
  }]
});
// Vertex AI only
const agent = AgentBuilder
  .withModel('gemini-2.5-flash')
  .withConfig({
    tools: [{
      googleSearchRetrieval: {
        dynamicRetrievalConfig: {
          mode: 'MODE_DYNAMIC',
          dynamicThreshold: 0.7
        }
      }
    }]
  })
  .build();

const response = await agent.ask('What are the latest TypeScript features?');
// Gemini will search Google and ground its response

Limitations

No Live Connections: Gemini models do not support live/bidirectional connections. The connect() method will throw an error.
Backend Differences: Some features (labels, display names) only work on Vertex AI, not Google AI API. The provider automatically handles these differences.

Next Steps

OpenAI Provider

Compare with GPT models

Anthropic Provider

Explore Claude models

Multimodal Tools

Build vision and audio tools

Vertex AI Setup

Deploy to production on Vertex AI

Build docs developers (and LLMs) love