Batch Prediction

Batch prediction allows you to process large volumes of requests asynchronously, which is more cost-effective than real-time inference for bulk operations.

Vertex AI batch prediction

Vertex AI batch prediction is only available in Vertex AI, not the Gemini Developer API.

Create a batch job

Create a batch prediction job with automatic configuration:

# Specify model and source file only
# Destination and job display name will be auto-populated
job = client.batches.create(
    model='gemini-2.5-flash',
    src='bq://my-project.my-dataset.my-table',  # or "gs://path/to/input/data"
)

print(job)

Input sources

Vertex AI supports multiple input sources:

# BigQuery source
job = client.batches.create(
    model='gemini-2.5-flash',
    src='bq://my-project.my-dataset.my-table',
)

# GCS source
job = client.batches.create(
    model='gemini-2.5-flash',
    src='gs://my-bucket/inputs/requests.jsonl',
)

Gemini API batch prediction

Create with inline requests

Create a batch job with requests defined inline:

# Create a batch job with inlined requests
batch_job = client.batches.create(
    model="gemini-2.5-flash",
    src=[{
        "contents": [{
            "parts": [{
                "text": "Hello!",
            }],
            "role": "user",
        }],
        "config": {"response_modalities": ["text"]},
    }],
)

print(batch_job)

Create with file input

Create a batch job using an uploaded file:

from google.genai import types

# Upload the file
file = client.files.upload(
    file='myrequests.json',
    config=types.UploadFileConfig(display_name='test-json')
)

# Create a batch job with file name
batch_job = client.batches.create(
    model="gemini-2.5-flash",
    src="files/test-json",
)

Input file format

The JSONL file should contain one request per line:

{"key":"request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}], "generation_config": {"response_modalities": ["TEXT"]}}}
{"key":"request_2", "request": {"contents": [{"parts": [{"text": "Explain how Crypto works in a few words"}]}]}}

Get batch job status

Check the status of a batch job:

# Get a job by name
job = client.batches.get(name=job.name)

print(job.state)
print(job)

Poll for completion

Wait for a batch job to complete:

import time

completed_states = set([
    'JOB_STATE_SUCCEEDED',
    'JOB_STATE_FAILED',
    'JOB_STATE_CANCELLED',
    'JOB_STATE_PAUSED',
])

while job.state not in completed_states:
    print(job.state)
    job = client.batches.get(name=job.name)
    time.sleep(30)

if job.state == 'JOB_STATE_SUCCEEDED':
    print("Batch job completed successfully!")
    print(f"Output location: {job.output_info.gcs_output_directory}")
else:
    print(f"Batch job ended with state: {job.state}")

List batch jobs

List all batch jobs:

from google.genai import types

for job in client.batches.list(config=types.ListBatchJobsConfig(page_size=10)):
    print(job)

Pagination

Navigate through pages of batch jobs:

pager = client.batches.list(config=types.ListBatchJobsConfig(page_size=10))
print(pager.page_size)
print(pager[0])
pager.next_page()
print(pager[0])

Async listing

from google.genai import types

async for job in await client.aio.batches.list(
    config=types.ListBatchJobsConfig(page_size=10)
):
    print(job)

With pagination:

async_pager = await client.aio.batches.list(
    config=types.ListBatchJobsConfig(page_size=10)
)
print(async_pager.page_size)
print(async_pager[0])
await async_pager.next_page()
print(async_pager[0])

Delete batch jobs

Delete a completed batch job:

# Delete the job resource
delete_job = client.batches.delete(name=job.name)

print(delete_job)

Working with GCS paths

Vertex AI batch jobs use Google Cloud Storage for input and output:

# Input from GCS
job = client.batches.create(
    model='gemini-2.5-flash',
    src='gs://my-bucket/inputs/requests.jsonl',
)

# Check the job
job = client.batches.get(name=job.name)

# Output will be written to GCS
if job.state == 'JOB_STATE_SUCCEEDED':
    output_uri = job.output_info.gcs_output_directory
    print(f"Results available at: {output_uri}")

Batch job states

Batch jobs progress through these states:

JOB_STATE_PENDING - Job is queued
JOB_STATE_RUNNING - Job is processing
JOB_STATE_SUCCEEDED - Job completed successfully
JOB_STATE_FAILED - Job failed with errors
JOB_STATE_CANCELLED - Job was cancelled
JOB_STATE_PAUSED - Job is temporarily paused

Best practices

Use batch prediction for large volumes of requests (100+)
Monitor job state regularly to detect failures early
Store input data in GCS for Vertex AI jobs
Set up proper IAM permissions for GCS access
Delete completed jobs to clean up resources
Use appropriate polling intervals (30-60 seconds)
Validate your input format before submitting large jobs

Get Started

Core Concepts

Content Generation

Advanced Features

Media Generation

Files & Embeddings

Fine-tuning & Batch

Configuration

Vertex AI batch prediction

Create a batch job

Input sources

Gemini API batch prediction

Create with inline requests

Create with file input

Input file format

Get batch job status

Poll for completion

List batch jobs

Async listing

Delete batch jobs

Working with GCS paths

Batch job states

Best practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Content Generation

Advanced Features

Media Generation

Files & Embeddings

Fine-tuning & Batch

Configuration

​Vertex AI batch prediction

​Create a batch job

​Input sources

​Gemini API batch prediction

​Create with inline requests

​Create with file input

​Input file format

​Get batch job status

​Poll for completion

​List batch jobs

​Pagination

​Async listing

​Delete batch jobs

​Working with GCS paths

​Batch job states

​Best practices

Build docs developers (and LLMs) love

Vertex AI batch prediction

Create a batch job

Input sources

Gemini API batch prediction

Create with inline requests

Create with file input

Input file format

Get batch job status

Poll for completion

List batch jobs

Pagination

Async listing

Delete batch jobs

Working with GCS paths

Batch job states

Best practices