Context Caching

Context caching allows you to save frequently used content (like large documents, files, or system instructions) and reuse them across multiple requests. This improves response times and reduces token usage for repeated content.

Benefits

Faster responses: Cached content doesn’t need to be reprocessed
Lower costs: You’re not charged for cached tokens in subsequent requests
Better performance: Ideal for long documents, knowledge bases, and system instructions

Creating Cached Content

Create a cache with content you want to reuse:

from google.genai import types

if client.vertexai:
    file_uris = [
        'gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf',
        'gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf',
    ]
else:
    file_uris = [file1.uri, file2.uri]

cached_content = client.caches.create(
    model='gemini-2.5-flash',
    config=types.CreateCachedContentConfig(
        contents=[
            types.Content(
                role='user',
                parts=[
                    types.Part.from_uri(
                        file_uri=file_uris[0], mime_type='application/pdf'
                    ),
                    types.Part.from_uri(
                        file_uri=file_uris[1],
                        mime_type='application/pdf',
                    ),
                ],
            )
        ],
        system_instruction='What is the sum of the two pdfs?',
        display_name='test cache',
        ttl='3600s',
    ),
)

Cache Configuration

When creating a cache, you can specify:

contents: The content to cache (documents, files, text, etc.)
system_instruction: Optional system instructions included in the cache
display_name: A human-readable name for the cache
ttl: Time-to-live for the cache (e.g., ‘3600s’ for 1 hour)

Time-to-Live (TTL)

The TTL determines how long the cache remains available:

Format: String with time unit (e.g., ‘3600s’, ‘60m’, ‘1h’, ‘1d’)
Minimum: 60 seconds
Maximum: 7 days
After expiration: The cache is automatically deleted

from google.genai import types

# Cache for 1 hour
cached_content = client.caches.create(
    model='gemini-2.5-flash',
    config=types.CreateCachedContentConfig(
        contents=[...],
        ttl='3600s',  # 1 hour
    ),
)

# Cache for 1 day
cached_content = client.caches.create(
    model='gemini-2.5-flash',
    config=types.CreateCachedContentConfig(
        contents=[...],
        ttl='86400s',  # 24 hours
    ),
)

Retrieving Cached Content

Get a cached content object by its name:

cached_content = client.caches.get(name=cached_content.name)

Using Cached Content

Reference the cached content in your generate content requests:

from google.genai import types

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='Summarize the pdfs',
    config=types.GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)
print(response.text)

The model will use the cached content as context without reprocessing it.

Multiple Requests with Same Cache

You can reuse the same cached content across multiple requests:

from google.genai import types

# First request
response1 = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='What are the main topics in the pdfs?',
    config=types.GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)

# Second request with different question
response2 = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='List the key findings from the pdfs',
    config=types.GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)

# Third request
response3 = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='Compare the methodologies in both pdfs',
    config=types.GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)

Each request benefits from the cached content without reprocessing the PDFs.

Listing Caches

View all your cached content:

caches = client.caches.list()
for cache in caches:
    print(f"Name: {cache.name}")
    print(f"Display name: {cache.display_name}")
    print(f"Expires at: {cache.expire_time}")

Updating Cache TTL

Extend the lifetime of a cache:

from google.genai import types

updated_cache = client.caches.update(
    name=cached_content.name,
    config=types.UpdateCachedContentConfig(
        ttl='7200s',  # Extend to 2 hours
    ),
)

Deleting Cached Content

Manually delete a cache before it expires:

client.caches.delete(name=cached_content.name)

Best Practices

Cache large content: Only cache content that’s large enough to benefit from caching (typically > 10K tokens)
Set appropriate TTL: Balance between cache availability and cost
Reuse caches: Use the same cache across multiple requests to maximize benefits
Monitor expiration: Track cache expiration times and recreate as needed
Cache stable content: Best for content that doesn’t change frequently

Common Use Cases

Long Documents

Cache large documents for Q&A:

from google.genai import types

# Create cache with document
cached_content = client.caches.create(
    model='gemini-2.5-flash',
    config=types.CreateCachedContentConfig(
        contents=[
            types.Content(
                role='user',
                parts=[types.Part.from_uri(
                    file_uri='gs://path/to/large-document.pdf',
                    mime_type='application/pdf'
                )],
            )
        ],
        display_name='Product documentation',
        ttl='86400s',  # Cache for 24 hours
    ),
)

# Ask multiple questions
for question in questions:
    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=question,
        config=types.GenerateContentConfig(
            cached_content=cached_content.name,
        ),
    )
    print(response.text)

System Instructions

Cache system instructions for consistent behavior:

from google.genai import types

# Cache system instructions
cached_content = client.caches.create(
    model='gemini-2.5-flash',
    config=types.CreateCachedContentConfig(
        contents=[],
        system_instruction='You are a helpful coding assistant that follows best practices...',
        display_name='Coding assistant persona',
        ttl='86400s',
    ),
)

# Use for multiple conversations
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='How do I write a Python decorator?',
    config=types.GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)

Knowledge Base

Cache multiple documents as a knowledge base:

from google.genai import types

# Cache multiple documents
cached_content = client.caches.create(
    model='gemini-2.5-flash',
    config=types.CreateCachedContentConfig(
        contents=[
            types.Content(
                role='user',
                parts=[
                    types.Part.from_uri(
                        file_uri=uri,
                        mime_type='application/pdf'
                    )
                    for uri in document_uris
                ],
            )
        ],
        display_name='Company knowledge base',
        ttl='604800s',  # 7 days
    ),
)

Cost Optimization

Caching reduces costs significantly for repeated content:

First request: Normal token pricing for cached content
Subsequent requests: Discounted pricing for cached tokens
Threshold: Caching is cost-effective when content is reused 2+ times

Cache tokens are counted separately from prompt and output tokens. Cached content typically costs much less than regular prompt tokens.

Get Started

Core Concepts

Content Generation

Advanced Features

Media Generation

Files & Embeddings

Fine-tuning & Batch

Configuration

Benefits

Creating Cached Content

Cache Configuration

Time-to-Live (TTL)

Retrieving Cached Content

Using Cached Content

Multiple Requests with Same Cache

Listing Caches

Updating Cache TTL

Deleting Cached Content

Best Practices

Common Use Cases

Long Documents

System Instructions

Knowledge Base

Cost Optimization

Build docs developers (and LLMs) love

Get Started

Core Concepts

Content Generation

Advanced Features

Media Generation

Files & Embeddings

Fine-tuning & Batch

Configuration

​Benefits

​Creating Cached Content

​Cache Configuration

​Time-to-Live (TTL)

​Retrieving Cached Content

​Using Cached Content

​Multiple Requests with Same Cache

​Listing Caches

​Updating Cache TTL

​Deleting Cached Content

​Best Practices

​Common Use Cases

​Long Documents

​System Instructions

​Knowledge Base

​Cost Optimization

Build docs developers (and LLMs) love

Benefits

Creating Cached Content

Cache Configuration

Time-to-Live (TTL)

Retrieving Cached Content

Using Cached Content

Multiple Requests with Same Cache

Listing Caches

Updating Cache TTL

Deleting Cached Content

Best Practices

Common Use Cases

Long Documents

System Instructions

Knowledge Base

Cost Optimization