Skip to main content
Gemini supports two powerful tuning approaches to customize model behavior for your specific use cases: Supervised Fine-Tuning (SFT) for task-specific adaptation and Direct Preference Optimization (DPO) for aligning models with human preferences.

Overview

Fine-tuning allows you to adapt Gemini’s base capabilities to your specific domain, style, or task requirements:

Supervised Fine-Tuning

Train models on labeled examples to specialize in specific tasks like Q&A, summarization, or classification

Preference Optimization

Align model outputs with human preferences by learning from ranked response pairs

Supervised Fine-Tuning (SFT)

Supervised fine-tuning uses labeled training data to refine the base model’s capabilities toward your specific tasks. Each training example demonstrates the desired output for a given input.

When to Use SFT

  • Adapting models to domain-specific terminology
  • Teaching specific output formats or structures
  • Improving performance on specialized tasks
  • Reducing prompt engineering complexity

Data Preparation

Training data should be in JSONL format with input-output pairs:
{
  "contents": [
    {
      "role": "user",
      "parts": [{"text": "Context: The Normans were an ethnic group...\nQuestion: In what country is Normandy located?"}]
    }
  ],
  "completion": {
    "role": "model",
    "parts": [{"text": "France"}]
  }
}
Ensure your training data is high-quality, well-labeled, and directly relevant to your target task. Low-quality data can adversely affect performance and introduce bias.

Training Process

1

Initialize the SDK

Set up your Google Cloud project and initialize the Gemini client:
from google import genai
from google.genai import types

PROJECT_ID = "your-project-id"
LOCATION = "us-central1"

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
2

Upload Training Data

Upload your JSONL files to Google Cloud Storage:
gsutil cp train_data.jsonl gs://your-bucket/tuning/
gsutil cp validation_data.jsonl gs://your-bucket/tuning/
3

Create Tuning Job

Start the supervised fine-tuning job:
# Create tuning job configuration
tuning_job = client.tuning_jobs.create(
    base_model="gemini-2.0-flash-001",
    training_dataset="gs://your-bucket/tuning/train_data.jsonl",
    validation_dataset="gs://your-bucket/tuning/validation_data.jsonl",
    tuned_model_display_name="gemini-qa-tuned",
    epochs=3,
    learning_rate=0.001
)

print(f"Tuning job created: {tuning_job.name}")
4

Monitor Training

Check the status of your tuning job:
# Poll for completion
job_status = client.tuning_jobs.get(tuning_job.name)
print(f"Status: {job_status.state}")
print(f"Progress: {job_status.tuning_progress}")
5

Deploy and Use

Once training completes, use your fine-tuned model:
# Use the tuned model
response = client.models.generate_content(
    model=tuning_job.tuned_model_endpoint,
    contents="Context: ...\nQuestion: What is the capital of France?"
)

print(response.text)

Evaluation

Evaluate your fine-tuned model on a held-out test set:
import pandas as pd

test_df = pd.read_csv("test_data.csv")
predictions = []

for _, row in test_df.iterrows():
    prompt = f"Context: {row['context']}\nQuestion: {row['question']}"
    response = client.models.generate_content(
        model=tuned_model_endpoint,
        contents=prompt
    )
    predictions.append(response.text)

# Calculate metrics
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(test_df['answer'], predictions)
print(f"Accuracy: {accuracy:.2%}")

Direct Preference Optimization (DPO)

DPO teaches Gemini to generate better responses by learning from human preferences. Instead of labeled “correct” answers, you provide pairs of responses where humans preferred one over the other.

Understanding DPO

Rather than teaching the “right answer,” DPO shows the model two responses and indicates which style or approach humans found more helpful:
Prompt: "Explain quantum computing"
✅ Preferred: Clear, concise explanation with analogies
❌ Rejected: Overly technical jargon without context

Data Format for DPO

DPO requires preference pairs in a specific format:
{
  "contents": [
    {"role": "user", "parts": [{"text": "Explain how photosynthesis works"}]}
  ],
  "completions": [
    {
      "score": 1.0,
      "completion": {
        "role": "model",
        "parts": [{"text": "Photosynthesis is the process where plants convert sunlight into energy..."}]
      }
    },
    {
      "score": 0.0,
      "completion": {
        "role": "model",
        "parts": [{"text": "Plants use light. It makes food."}]
      }
    }
  ]
}

Training with DPO

1

Prepare Preference Data

Transform your preference dataset:
from datasets import load_dataset

# Load preference dataset (e.g., UltraFeedback)
dataset = load_dataset("zhengr/ultrafeedback_binarized")

# Transform to Gemini format
train_transformed = []
for example in dataset["train_prefs"][:1000]:
    result = transform_to_gemini_format(example)
    if result:
        train_transformed.append(result)

# Save as JSONL
with open("dpo_train.jsonl", "w") as f:
    for item in train_transformed:
        f.write(json.dumps(item) + "\n")
2

Upload to Cloud Storage

gsutil cp dpo_train.jsonl gs://your-bucket/dpo/
gsutil cp dpo_validation.jsonl gs://your-bucket/dpo/
3

Start DPO Training

# Create DPO tuning job
dpo_job = client.tuning_jobs.create(
    base_model="gemini-2.0-flash-001",
    training_dataset="gs://your-bucket/dpo/dpo_train.jsonl",
    validation_dataset="gs://your-bucket/dpo/dpo_validation.jsonl",
    tuned_model_display_name="gemini-dpo-tuned",
    tuning_method="dpo",
    epochs=2,
    learning_rate=0.0001
)
4

Compare Before and After

Test the model’s improved behavior:
# Test with base model
base_response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="Explain quantum computing"
)

# Test with DPO-tuned model
tuned_response = client.models.generate_content(
    model=dpo_job.tuned_model_endpoint,
    contents="Explain quantum computing"
)

print("Base model:", base_response.text)
print("DPO-tuned:", tuned_response.text)

Best Practices

Data Quality

  • Use high-quality, diverse training examples
  • Aim for 500-10,000+ examples for production
  • Balance your dataset across different scenarios

Evaluation Strategy

  • Always use a separate validation set
  • Test on real-world scenarios
  • Monitor for overfitting

Hyperparameter Tuning

  • Start with default learning rates
  • Use 2-5 epochs for most tasks
  • Experiment with batch sizes

Cost Optimization

  • Start with small datasets to validate approach
  • Use the token estimation tool to calculate costs
  • Consider using Gemini Flash for faster, cheaper tuning
Fine-tuning can introduce or amplify biases present in your training data. Always evaluate outputs for fairness, safety, and alignment with your values before deploying to production.

Multimodal Fine-Tuning

Gemini supports fine-tuning on multimodal data including images:
# Multimodal training example
example = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {"text": "Describe this image"},
                {"inline_data": {
                    "mime_type": "image/jpeg",
                    "data": base64_image_data
                }}
            ]
        }
    ],
    "completion": {
        "role": "model",
        "parts": [{"text": "This image shows a manufacturing defect..."}]
    }
}

Cost Estimation

Estimate tuning costs before training:
# Token count estimation
from vertexai.preview.tuning import sft

estimate = sft.estimate_tuning_cost(
    training_dataset_uri="gs://your-bucket/train_data.jsonl",
    base_model="gemini-2.0-flash-001"
)

print(f"Estimated tokens: {estimate.token_count:,}")
print(f"Estimated cost: ${estimate.estimated_cost:.2f}")

Next Steps

Model Evaluation

Evaluate your fine-tuned models with custom metrics

Deployment

Deploy tuned models to production endpoints

Function Calling

Combine tuning with function calling capabilities

Pricing

View detailed pricing for tuning operations

Build docs developers (and LLMs) love