Skip to main content

Getting Started with Gemini

This guide walks you through the fundamentals of using Gemini models for text generation on Google Cloud.

Prerequisites

1

Google Cloud Project

You need an active Google Cloud project with billing enabled. Create a project if you don’t have one.
2

Enable APIs

Enable the Vertex AI API in your project:Enable Vertex AI API
3

Authentication

Set up authentication for your environment using Application Default Credentials or a service account.

Installation

Install the Google Gen AI SDK for Python:
pip install --upgrade google-genai

Basic Setup

Import Libraries

import os
from google import genai
from google.genai.types import GenerateContentConfig
from IPython.display import Markdown, display

Authenticate (Colab Only)

If you’re using Google Colab, authenticate your session:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

Initialize the Client

PROJECT_ID = "your-project-id"
LOCATION = "global"  # or "us-central1", "europe-west1", etc.

client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location=LOCATION
)

Your First Request

Send a simple text generation request:
response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Explain quantum computing in simple terms."
)

print(response.text)
The response.text property returns the generated text. For more details about the response structure, access response.candidates[0].

Choosing a Model

Select the appropriate model for your use case:
MODEL_ID = "gemini-3.1-pro-preview"
Best for:
  • Complex reasoning tasks
  • Advanced code generation
  • Agentic workflows
  • Maximum quality

Configuration Parameters

Temperature

Controls randomness in the output (0.0 to 2.0):
response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Write a creative story about a robot.",
    config=GenerateContentConfig(
        temperature=1.5  # Higher = more creative
    )
)
For Gemini 3 models, we strongly recommend keeping temperature=1.0 as the reasoning capabilities are optimized for this default value.

Top-P (Nucleus Sampling)

Controls diversity by considering only the top probability mass:
config = GenerateContentConfig(
    temperature=1.0,
    top_p=0.95  # Consider top 95% probability tokens
)

Max Output Tokens

Limit the length of generated responses:
config = GenerateContentConfig(
    max_output_tokens=2048  # Maximum tokens in response
)

Complete Example

from google.genai.types import GenerateContentConfig, ThinkingConfig, ThinkingLevel

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Explain the theory of relativity.",
    config=GenerateContentConfig(
        temperature=1.0,
        top_p=0.95,
        max_output_tokens=8000,
        thinking_config=ThinkingConfig(
            thinking_level=ThinkingLevel.MEDIUM
        )
    )
)

print(response.text)

Thinking Levels

Gemini 3.1 Pro supports configurable reasoning depth:
from google.genai.types import ThinkingConfig, ThinkingLevel

config = GenerateContentConfig(
    thinking_config=ThinkingConfig(
        thinking_level=ThinkingLevel.LOW
    )
)
  • Token budget: 1-1,000 tokens
  • Best for: Simple queries, chat, fast responses
  • Latency: Minimal

Streaming Responses

Stream responses token-by-token for better user experience:
for chunk in client.models.generate_content_stream(
    model="gemini-3.1-pro-preview",
    contents="Write a short story about space exploration.",
    config=GenerateContentConfig(
        thinking_config=ThinkingConfig(
            thinking_level=ThinkingLevel.LOW
        )
    )
):
    print(chunk.text, end="")

System Instructions

Provide persistent instructions that apply to all requests:
system_instruction = """
You are a helpful AI assistant specializing in Python programming.
Always provide code examples with comments.
Explain complex concepts in simple terms.
"""

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="How do I read a CSV file?",
    config=GenerateContentConfig(
        system_instruction=system_instruction
    )
)

Multi-Turn Conversations

Create chat sessions to maintain conversation context:
chat = client.chats.create(
    model="gemini-3.1-pro-preview",
    config=GenerateContentConfig(
        temperature=1.0
    )
)

# First message
response = chat.send_message("What is a binary search tree?")
print("Assistant:", response.text)

# Follow-up message (maintains context)
response = chat.send_message("Can you show me an implementation?")
print("Assistant:", response.text)

# View conversation history
for message in chat.history:
    print(f"{message.role}: {message.parts[0].text[:100]}...")

Error Handling

Handle common errors gracefully:
from google.api_core import exceptions

try:
    response = client.models.generate_content(
        model="gemini-3.1-pro-preview",
        contents="Your prompt here"
    )
    print(response.text)
    
except exceptions.InvalidArgument as e:
    print(f"Invalid request: {e}")
    
except exceptions.ResourceExhausted as e:
    print(f"Quota exceeded: {e}")
    
except exceptions.DeadlineExceeded as e:
    print(f"Request timeout: {e}")
    
except Exception as e:
    print(f"Unexpected error: {e}")

Response Structure

Understand the response object:
response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Hello!"
)

# Access the text
print(response.text)

# Access detailed candidate information
candidate = response.candidates[0]
print(f"Finish reason: {candidate.finish_reason}")
print(f"Safety ratings: {candidate.safety_ratings}")

# Access usage metadata
print(f"Input tokens: {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
print(f"Total tokens: {response.usage_metadata.total_token_count}")

Safety Settings

Configure content filtering:
from google.genai.types import SafetySetting, HarmCategory, HarmBlockThreshold

safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
    )
]

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Your content here",
    config=GenerateContentConfig(
        safety_settings=safety_settings
    )
)

# Check if response was blocked
if response.candidates[0].finish_reason == "SAFETY":
    print("Response was blocked due to safety filters")
    for rating in response.candidates[0].safety_ratings:
        print(f"{rating.category}: {rating.probability}")

Best Practices

Use Clear Prompts

Be specific and provide context for better results

Handle Errors

Implement proper error handling for production apps

Monitor Usage

Track token consumption to manage costs

Set Limits

Use max_output_tokens to control response length

Next Steps

Multimodal

Learn to process images, video, and audio

Function Calling

Connect Gemini to external tools and APIs

Grounding

Ground responses in real-time data

Context Caching

Optimize costs with context caching

Build docs developers (and LLMs) love