Gemini Model Tuning

Gemini supports two powerful tuning approaches to customize model behavior for your specific use cases: Supervised Fine-Tuning (SFT) for task-specific adaptation and Direct Preference Optimization (DPO) for aligning models with human preferences.

Overview

Fine-tuning allows you to adapt Gemini’s base capabilities to your specific domain, style, or task requirements:

Supervised Fine-Tuning

Train models on labeled examples to specialize in specific tasks like Q&A, summarization, or classification

Preference Optimization

Align model outputs with human preferences by learning from ranked response pairs

Supervised Fine-Tuning (SFT)

Supervised fine-tuning uses labeled training data to refine the base model’s capabilities toward your specific tasks. Each training example demonstrates the desired output for a given input.

When to Use SFT

Adapting models to domain-specific terminology
Teaching specific output formats or structures
Improving performance on specialized tasks
Reducing prompt engineering complexity

Data Preparation

Training data should be in JSONL format with input-output pairs:

{
  "contents": [
    {
      "role": "user",
      "parts": [{"text": "Context: The Normans were an ethnic group...\nQuestion: In what country is Normandy located?"}]
    }
  ],
  "completion": {
    "role": "model",
    "parts": [{"text": "France"}]
  }
}

Ensure your training data is high-quality, well-labeled, and directly relevant to your target task. Low-quality data can adversely affect performance and introduce bias.

Training Process

Initialize the SDK

Set up your Google Cloud project and initialize the Gemini client:

from google import genai
from google.genai import types

PROJECT_ID = "your-project-id"
LOCATION = "us-central1"

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

Upload Training Data

Upload your JSONL files to Google Cloud Storage:

gsutil cp train_data.jsonl gs://your-bucket/tuning/
gsutil cp validation_data.jsonl gs://your-bucket/tuning/

Create Tuning Job

Start the supervised fine-tuning job:

# Create tuning job configuration
tuning_job = client.tuning_jobs.create(
    base_model="gemini-2.0-flash-001",
    training_dataset="gs://your-bucket/tuning/train_data.jsonl",
    validation_dataset="gs://your-bucket/tuning/validation_data.jsonl",
    tuned_model_display_name="gemini-qa-tuned",
    epochs=3,
    learning_rate=0.001
)

print(f"Tuning job created: {tuning_job.name}")

Monitor Training

Check the status of your tuning job:

# Poll for completion
job_status = client.tuning_jobs.get(tuning_job.name)
print(f"Status: {job_status.state}")
print(f"Progress: {job_status.tuning_progress}")

Deploy and Use

Once training completes, use your fine-tuned model:

# Use the tuned model
response = client.models.generate_content(
    model=tuning_job.tuned_model_endpoint,
    contents="Context: ...\nQuestion: What is the capital of France?"
)

print(response.text)

Evaluation

Evaluate your fine-tuned model on a held-out test set:

import pandas as pd

test_df = pd.read_csv("test_data.csv")
predictions = []

for _, row in test_df.iterrows():
    prompt = f"Context: {row['context']}\nQuestion: {row['question']}"
    response = client.models.generate_content(
        model=tuned_model_endpoint,
        contents=prompt
    )
    predictions.append(response.text)

# Calculate metrics
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(test_df['answer'], predictions)
print(f"Accuracy: {accuracy:.2%}")

Direct Preference Optimization (DPO)

DPO teaches Gemini to generate better responses by learning from human preferences. Instead of labeled “correct” answers, you provide pairs of responses where humans preferred one over the other.

Understanding DPO

Rather than teaching the “right answer,” DPO shows the model two responses and indicates which style or approach humans found more helpful:

Prompt: "Explain quantum computing"
✅ Preferred: Clear, concise explanation with analogies
❌ Rejected: Overly technical jargon without context

Data Format for DPO

DPO requires preference pairs in a specific format:

{
  "contents": [
    {"role": "user", "parts": [{"text": "Explain how photosynthesis works"}]}
  ],
  "completions": [
    {
      "score": 1.0,
      "completion": {
        "role": "model",
        "parts": [{"text": "Photosynthesis is the process where plants convert sunlight into energy..."}]
      }
    },
    {
      "score": 0.0,
      "completion": {
        "role": "model",
        "parts": [{"text": "Plants use light. It makes food."}]
      }
    }
  ]
}

Training with DPO

Prepare Preference Data

Transform your preference dataset:

from datasets import load_dataset

# Load preference dataset (e.g., UltraFeedback)
dataset = load_dataset("zhengr/ultrafeedback_binarized")

# Transform to Gemini format
train_transformed = []
for example in dataset["train_prefs"][:1000]:
    result = transform_to_gemini_format(example)
    if result:
        train_transformed.append(result)

# Save as JSONL
with open("dpo_train.jsonl", "w") as f:
    for item in train_transformed:
        f.write(json.dumps(item) + "\n")

Upload to Cloud Storage

gsutil cp dpo_train.jsonl gs://your-bucket/dpo/
gsutil cp dpo_validation.jsonl gs://your-bucket/dpo/

Start DPO Training

# Create DPO tuning job
dpo_job = client.tuning_jobs.create(
    base_model="gemini-2.0-flash-001",
    training_dataset="gs://your-bucket/dpo/dpo_train.jsonl",
    validation_dataset="gs://your-bucket/dpo/dpo_validation.jsonl",
    tuned_model_display_name="gemini-dpo-tuned",
    tuning_method="dpo",
    epochs=2,
    learning_rate=0.0001
)

Compare Before and After

Test the model’s improved behavior:

# Test with base model
base_response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="Explain quantum computing"
)

# Test with DPO-tuned model
tuned_response = client.models.generate_content(
    model=dpo_job.tuned_model_endpoint,
    contents="Explain quantum computing"
)

print("Base model:", base_response.text)
print("DPO-tuned:", tuned_response.text)

Best Practices

Data Quality

Use high-quality, diverse training examples
Aim for 500-10,000+ examples for production
Balance your dataset across different scenarios

Evaluation Strategy

Always use a separate validation set
Test on real-world scenarios
Monitor for overfitting

Hyperparameter Tuning

Start with default learning rates
Use 2-5 epochs for most tasks
Experiment with batch sizes

Cost Optimization

Start with small datasets to validate approach
Use the token estimation tool to calculate costs
Consider using Gemini Flash for faster, cheaper tuning

Fine-tuning can introduce or amplify biases present in your training data. Always evaluate outputs for fairness, safety, and alignment with your values before deploying to production.

Multimodal Fine-Tuning

Gemini supports fine-tuning on multimodal data including images:

# Multimodal training example
example = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {"text": "Describe this image"},
                {"inline_data": {
                    "mime_type": "image/jpeg",
                    "data": base64_image_data
                }}
            ]
        }
    ],
    "completion": {
        "role": "model",
        "parts": [{"text": "This image shows a manufacturing defect..."}]
    }
}

Cost Estimation

Estimate tuning costs before training:

# Token count estimation
from vertexai.preview.tuning import sft

estimate = sft.estimate_tuning_cost(
    training_dataset_uri="gs://your-bucket/train_data.jsonl",
    base_model="gemini-2.0-flash-001"
)

print(f"Estimated tokens: {estimate.token_count:,}")
print(f"Estimated cost: ${estimate.estimated_cost:.2f}")

Next Steps

Model Evaluation

Evaluate your fine-tuned models with custom metrics

Deployment

Deploy tuned models to production endpoints

Function Calling

Combine tuning with function calling capabilities

Pricing

View detailed pricing for tuning operations

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

Overview

Supervised Fine-Tuning

Preference Optimization

Supervised Fine-Tuning (SFT)

When to Use SFT

Data Preparation

Training Process

Evaluation

Direct Preference Optimization (DPO)

Understanding DPO

Data Format for DPO

Training with DPO

Best Practices

Data Quality

Evaluation Strategy

Hyperparameter Tuning

Cost Optimization

Multimodal Fine-Tuning

Cost Estimation

Next Steps

Model Evaluation

Deployment

Function Calling

Pricing

Build docs developers (and LLMs) love

Getting Started

Gemini Models

Agents

RAG & Search

Embeddings & Vector Search

Vision

Audio

​Overview

Supervised Fine-Tuning

Preference Optimization

​Supervised Fine-Tuning (SFT)

​When to Use SFT

​Data Preparation

​Training Process

​Evaluation

​Direct Preference Optimization (DPO)

​Understanding DPO

​Data Format for DPO

​Training with DPO

​Best Practices

Data Quality

Evaluation Strategy

Hyperparameter Tuning

Cost Optimization

​Multimodal Fine-Tuning

​Cost Estimation

​Next Steps

Model Evaluation

Deployment

Function Calling

Pricing

Build docs developers (and LLMs) love

Overview

Supervised Fine-Tuning (SFT)

When to Use SFT

Data Preparation

Training Process

Evaluation

Direct Preference Optimization (DPO)

Understanding DPO

Data Format for DPO

Training with DPO

Best Practices

Multimodal Fine-Tuning

Cost Estimation

Next Steps