Google Gemini Integration

The Google Gemini integration brings Google’s most advanced AI models to n8n, offering powerful multimodal capabilities that can understand and generate text, analyze images, process audio, and even work with video content.

Available Nodes

Google Gemini Node

Direct access to Gemini for text, image, audio, video, and document operations

Google Gemini Chat Model

Use Gemini with AI Agent for advanced workflows with tools and memory

Prerequisites

Before you begin, you’ll need:

A Google Cloud account or Google AI Studio account
A Google Gemini API key
(Optional) Google Cloud project with Vertex AI enabled for production use

Setup

Get Your API Key

Option 1: Google AI Studio (Recommended for getting started)

Go to Google AI Studio
Click Get API Key
Create a new key or use an existing one
Copy the API key

Option 2: Google Cloud Console (For production)

Go to Google Cloud Console
Enable the Vertex AI API
Create credentials for the API
Copy the API key

Configure in n8n

Add a Google Gemini node to your workflow
Click Credential to connect with
Select Create New Credential
Enter your API key
(Optional) Set custom host URL for Vertex AI
Click Save

Test the Connection

Send a simple text message to verify your credentials are working correctly.

Google Gemini Node

The Google Gemini node provides comprehensive access to Gemini’s multimodal capabilities across multiple resources.

Available Resources

Send messages to Gemini and receive intelligent responses.Operations:

Message: Send prompts and get responses from Gemini

Features:

Multi-turn conversations
System instructions support
Tool/function calling
JSON mode for structured output
Safety settings configuration

Example Configuration:

{
  "resource": "text",
  "operation": "message",
  "modelId": "gemini-2.0-flash",
  "messages": [
    {
      "content": "Explain how neural networks work in simple terms",
      "role": "user"
    }
  ]
}

Available Roles:

User: Send messages as the user
Model: Set Gemini’s response style

Analyze and generate images using Gemini’s vision capabilities.Operations:

Analyze: Understand and describe images
Generate: Create images from text descriptions (Imagen)
Edit: Modify existing images

Vision Capabilities:

Image description and captioning
Object detection and identification
Text extraction (OCR)
Scene understanding
Visual question answering
Spatial reasoning

Supported Formats:

JPEG, PNG, WebP, HEIC, HEIF
Base64 encoded images
Image URLs
Binary image data

Example:

{
  "resource": "image",
  "operation": "analyze",
  "modelId": "gemini-2.0-flash",
  "prompt": "Describe this image in detail and identify all objects",
  "binaryPropertyName": "data"
}

Gemini Pro Vision and Gemini 1.5+ models have excellent multimodal understanding capabilities.

Process and understand audio content.Operations:

Analyze: Understand audio context and content
Transcribe: Convert speech to text

Capabilities:

Speech transcription
Audio content understanding
Music and sound identification
Emotion and tone detection
Multi-language support

Supported Formats:

WAV, FLAC, MP3
AAC, OGG, OPUS
Binary audio data

Example:

{
  "resource": "audio",
  "operation": "transcribe",
  "modelId": "gemini-2.0-flash",
  "binaryPropertyName": "data"
}

Audio understanding is available in Gemini 1.5 Pro and later models.

Analyze and generate video content.Operations:

Analyze: Understand video content
Generate: Create videos from descriptions (Veo)
Download: Download generated videos

Video Analysis Capabilities:

Scene understanding
Object tracking
Action recognition
Visual question answering about videos
Temporal reasoning

Supported Formats:

MP4, MOV, AVI
WebM, FLV, MKV
Video URLs
Binary video data

Example:

{
  "resource": "video",
  "operation": "analyze",
  "modelId": "gemini-2.0-flash",
  "prompt": "Summarize what happens in this video",
  "binaryPropertyName": "data"
}

Video understanding is a unique strength of Gemini 1.5 Pro with support for hours of video.

Analyze documents and extract information.Operations:

Analyze: Extract information and insights from documents

Supported Formats:

PDF documents
Text files
Google Docs
Markdown files
HTML documents

Use Cases:

Document summarization
Information extraction
Contract analysis
Research paper review
Data extraction from forms

Example:

{
  "resource": "document",
  "operation": "analyze",
  "modelId": "gemini-2.0-flash",
  "prompt": "Extract all dates, names, and amounts from this invoice",
  "binaryPropertyName": "data"
}

Upload and manage media files for use with Gemini.Operations:

Upload: Upload files to Google’s servers

Supported File Types:

Images (JPEG, PNG, etc.)
Audio (MP3, WAV, etc.)
Video (MP4, MOV, etc.)
Documents (PDF, TXT, etc.)

Benefits:

Reuse files across multiple requests
Handle large files more efficiently
Better for video and audio processing

Example:

{
  "resource": "file",
  "operation": "upload",
  "binaryPropertyName": "data",
  "fileName": "video.mp4",
  "mimeType": "video/mp4"
}

Create and manage vector stores for semantic search.Operations:

Create Store: Create a new vector store
Upload to Store: Add documents to a store
List Stores: View all your vector stores
Delete Store: Remove a vector store

Use Cases:

Knowledge base search
Document retrieval
Semantic search across documents
RAG (Retrieval Augmented Generation)

Example:

{
  "resource": "fileSearch",
  "operation": "createStore",
  "storeName": "Company Knowledge Base"
}

File search enables powerful RAG workflows where Gemini can search through your document collections.

Gemini Models

Google offers several Gemini models with different capabilities:

Model	Best For	Context Window	Key Features
gemini-2.0-flash	Latest, fastest	1M tokens	Multimodal, fast responses, cost-effective
gemini-1.5-pro	Advanced reasoning	2M tokens	Best quality, longest context, video understanding
gemini-1.5-flash	Balanced performance	1M tokens	Fast, multimodal, good quality
gemini-1.0-pro	Legacy tasks	32K tokens	Text-only, baseline model

Gemini 1.5 Pro has the longest context window of any large language model at 2 million tokens, allowing it to process entire codebases, long videos, and large document collections.

Advanced Features

Tool Use (Function Calling)

Connect tools to Gemini for dynamic interactions:

Connect tool nodes to the Tools input
Gemini will automatically decide when to use tools
Tools execute and return results
Gemini incorporates results in its response

Built-in Tools

Gemini supports built-in tools for specific capabilities: Code Execution: Allow Gemini to write and run Python code:

{
  "builtInTools": {
    "codeExecution": true
  }
}

Google Search: Enable Gemini to search the web for current information:

{
  "builtInTools": {
    "googleSearch": true
  }
}

JSON Mode

Request structured JSON output:

{
  "responseFormat": "json",
  "prompt": "Extract the name, email, and phone from this text as JSON"
}

Safety Settings

Control content safety thresholds:

{
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }
  ]
}

System Instructions

Set Gemini’s behavior and context:

{
  "systemInstruction": "You are a helpful data analyst specializing in Python and SQL. Always provide code examples with explanations."
}

Google Gemini Chat Model

The Google Gemini Chat Model node is designed for use with LangChain components, particularly the AI Agent.

Setup with AI Agent

Add Chat Model

Add the Google Gemini Chat Model node to your workflow.

Select Model

Choose the appropriate Gemini model:

gemini-2.0-flash: Latest, fastest, great for most tasks
gemini-1.5-pro: Maximum capability, longest context
gemini-1.5-flash: Balanced speed and quality

Configure Parameters

Set temperature, max tokens, and other options:

{
  "temperature": 0.7,
  "maxTokens": 8192,
  "topP": 0.95,
  "topK": 40
}

Connect to AI Agent

Wire the chat model to an AI Agent:

Model Parameters

{
  "model": "gemini-2.0-flash",
  "temperature": 0.7,
  "maxTokens": 8192
}

Common Use Cases

1. Video Content Analysis

Analyze video content automatically:

2. Multimodal Customer Support

Handle text, image, and document queries:

3. Document Processing Pipeline

Extract and process document data:

4. Audio Transcription Workflow

Transcribe and analyze audio:

5. RAG with File Search

Build a retrieval-augmented generation system:

Best Practices

Choose the Right Model

gemini-2.0-flash: Fast responses, most tasks
gemini-1.5-pro: Complex reasoning, long context
gemini-1.5-flash: Balanced performance

Leverage Multimodal Capabilities

Combine text, images, audio, and video in single prompts
Use video understanding for long-form content
Process documents with visual elements effectively

Optimize Context Usage

Gemini supports massive context (up to 2M tokens)
Use for long documents and entire codebases
Consider chunking only for processing speed

Use Built-in Tools

Enable code execution for math and data analysis
Use Google Search for current information
Combine with custom tools for powerful agents

Configure Safety Settings

Set appropriate thresholds for your use case
Monitor filtered responses
Adjust as needed for your application

Troubleshooting

Rate Limits

If you encounter rate limits:

Implement exponential backoff
Reduce request frequency
Upgrade to higher quota tier
Use batch processing

Context Length Errors

If inputs are too long:

Check total token count
Use Gemini 1.5 Pro for longer context (2M tokens)
Chunk inputs if necessary
Remove redundant information

Media Processing Errors

If media files fail to process:

Verify file format is supported
Check file size limits
Upload large files using the File API first
Ensure proper encoding

Tool Calling Issues

If tools aren’t working:

Verify tool connections
Check tool descriptions are clear
Test tools independently
Review tool output format

Safety Filter Blocks

If responses are filtered:

Review safety settings
Adjust thresholds if appropriate
Rephrase prompts
Check content guidelines

Integration Catalog

AI Integrations

Google Gemini Integration

Google Gemini Integration

Available Nodes

Google Gemini Node

Google Gemini Chat Model

Prerequisites

Setup

Google Gemini Node

Available Resources

Gemini Models

Advanced Features

Tool Use (Function Calling)

Built-in Tools

JSON Mode

Safety Settings

System Instructions

Google Gemini Chat Model

Setup with AI Agent

Model Parameters

Common Use Cases

1. Video Content Analysis

2. Multimodal Customer Support

3. Document Processing Pipeline

4. Audio Transcription Workflow

5. RAG with File Search

Best Practices

Troubleshooting

Rate Limits

Context Length Errors

Media Processing Errors

Tool Calling Issues

Safety Filter Blocks

Resources

Integration Catalog

AI Integrations

​Google Gemini Integration

​Available Nodes

Google Gemini Node

Google Gemini Chat Model

​Prerequisites

​Setup

​Google Gemini Node

​Available Resources

​Gemini Models

​Advanced Features

​Tool Use (Function Calling)

​Built-in Tools

​JSON Mode

​Safety Settings

​System Instructions

​Google Gemini Chat Model

​Setup with AI Agent

​Model Parameters

​Common Use Cases

​1. Video Content Analysis

​2. Multimodal Customer Support

​3. Document Processing Pipeline

​4. Audio Transcription Workflow

​5. RAG with File Search

​Best Practices

​Troubleshooting

​Rate Limits

​Context Length Errors

​Media Processing Errors

​Tool Calling Issues

​Safety Filter Blocks

​Resources

Google Gemini Integration

Available Nodes

Prerequisites

Setup

Google Gemini Node

Available Resources

Gemini Models

Advanced Features

Tool Use (Function Calling)

Built-in Tools

JSON Mode

Safety Settings

System Instructions

Google Gemini Chat Model

Setup with AI Agent

Model Parameters

Common Use Cases

1. Video Content Analysis

2. Multimodal Customer Support

3. Document Processing Pipeline

4. Audio Transcription Workflow

5. RAG with File Search

Best Practices

Troubleshooting

Rate Limits

Context Length Errors

Media Processing Errors

Tool Calling Issues

Safety Filter Blocks

Resources