Streaming Responses

Streaming allows the model to send responses back incrementally as they’re generated, rather than waiting for the complete response. This is ideal for better user experience in interactive applications.

Synchronous Streaming

Basic Text Streaming

The generate_content_stream method returns an iterator that yields chunks as they arrive:

from google import genai

client = genai.Client(api_key='your-api-key')

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Tell me a story in 300 words.'
):
    print(chunk.text, end='')

Notice the end='' parameter in print() - this prevents adding newlines between chunks and creates a smooth streaming effect.

Streaming with Images

You can stream responses for multimodal inputs:

Cloud Storage (GCS)
Local Files

Stream responses for images stored in Google Cloud Storage:

from google.genai import types

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents=[
        'What is this image about?',
        types.Part.from_uri(
            file_uri='gs://generativeai-downloads/images/scones.jpg',
            mime_type='image/jpeg',
        ),
    ],
):
    print(chunk.text, end='')

Stream responses for images from your local filesystem:

from google.genai import types

YOUR_IMAGE_PATH = 'your_image_path'
YOUR_IMAGE_MIME_TYPE = 'image/jpeg'

with open(YOUR_IMAGE_PATH, 'rb') as f:
    image_bytes = f.read()

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents=[
        'What is this image about?',
        types.Part.from_bytes(
            data=image_bytes,
            mime_type=YOUR_IMAGE_MIME_TYPE
        ),
    ],
):
    print(chunk.text, end='')

Asynchronous Streaming

For async applications, use the aio client to stream responses asynchronously:

Basic Async Streaming

import asyncio
from google import genai

client = genai.Client(api_key='your-api-key')

async def stream_content():
    async for chunk in await client.aio.models.generate_content_stream(
        model='gemini-2.5-flash',
        contents='Tell me a story in 300 words.'
    ):
        print(chunk.text, end='')

asyncio.run(stream_content())

Async Non-Streaming

You can also use async without streaming:

import asyncio

async def generate_content():
    response = await client.aio.models.generate_content(
        model='gemini-2.5-flash',
        contents='Tell me a story in 300 words.'
    )
    print(response.text)

asyncio.run(generate_content())

Processing Stream Chunks

Each chunk in the stream has the same structure as a complete response:

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Explain quantum computing'
):
    # Access text directly
    print(chunk.text, end='')

    # Or access parts
    for part in chunk.parts:
        if part.text:
            print(part.text, end='')

    # Access usage metadata (available on final chunk)
    if hasattr(chunk, 'usage_metadata'):
        print(f"\nTokens used: {chunk.usage_metadata.total_token_count}")

Chat Streaming

Streaming works seamlessly with chat sessions:

Sync Chat Streaming
Async Chat Streaming

chat = client.chats.create(model='gemini-2.5-flash')

for chunk in chat.send_message_stream('tell me a story'):
    print(chunk.text, end='')

print("\n---")

for chunk in chat.send_message_stream('summarize it in one sentence'):
    print(chunk.text, end='')

import asyncio

async def chat_stream():
    chat = await client.aio.chats.create(model='gemini-2.5-flash')

    async for chunk in await chat.send_message_stream('tell me a story'):
        print(chunk.text, end='')

    print("\n---")

    async for chunk in await chat.send_message_stream('summarize it in one sentence'):
        print(chunk.text, end='')

asyncio.run(chat_stream())

Streaming with Configuration

You can apply all standard configuration options to streaming:

from google.genai import types

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Write a poem about coding',
    config=types.GenerateContentConfig(
        temperature=0.9,
        max_output_tokens=500,
        system_instruction='You are a creative poet who writes in rhyme.',
    ),
):
    print(chunk.text, end='')

Buffering Strategies

Different strategies for handling streamed content:

Immediate Display
Accumulate and Display
Sentence-by-Sentence

Display each chunk immediately (best for chatbots):

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Tell me a story'
):
    print(chunk.text, end='', flush=True)

Accumulate the full response while streaming:

full_response = []

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Tell me a story'
):
    full_response.append(chunk.text)
    print(chunk.text, end='', flush=True)

# Full response available after streaming
complete_text = ''.join(full_response)
print(f"\n\nTotal length: {len(complete_text)}")

Buffer and display complete sentences:

buffer = ""

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Tell me a story'
):
    buffer += chunk.text
    # Display when we have a complete sentence
    while '. ' in buffer or '! ' in buffer or '? ' in buffer:
        for delimiter in ['. ', '! ', '? ']:
            if delimiter in buffer:
                sentence, buffer = buffer.split(delimiter, 1)
                print(sentence + delimiter)
                break

# Print any remaining text
if buffer:
    print(buffer)

Use Cases

Chatbots

Real-time responses for better user engagement

Content Writing

Show progress for long-form content generation

Code Generation

Display code as it’s being generated

Summarization

Progressive summaries for long documents

Best Practices

Use streaming for responses that take more than 2-3 seconds
Add flush=True to print() for immediate display in terminals
Use async streaming in web applications for better concurrency
Buffer content if you need to process the complete response
Handle network interruptions gracefully with try-except blocks
Consider user experience - streaming improves perceived latency

Error Handling

try:
    for chunk in client.models.generate_content_stream(
        model='gemini-2.5-flash',
        contents='Tell me a story'
    ):
        print(chunk.text, end='', flush=True)
except Exception as e:
    print(f"\n\nStreaming error: {e}")

Get Started

Core Concepts

Content Generation

Advanced Features

Media Generation

Files & Embeddings

Fine-tuning & Batch

Configuration

Synchronous Streaming

Basic Text Streaming

Streaming with Images

Asynchronous Streaming

Basic Async Streaming

Async Non-Streaming

Processing Stream Chunks

Chat Streaming

Streaming with Configuration

Buffering Strategies

Use Cases

Chatbots

Content Writing

Code Generation

Summarization

Best Practices

Error Handling

Build docs developers (and LLMs) love

Get Started

Core Concepts

Content Generation

Advanced Features

Media Generation

Files & Embeddings

Fine-tuning & Batch

Configuration

​Synchronous Streaming

​Basic Text Streaming

​Streaming with Images

​Asynchronous Streaming

​Basic Async Streaming

​Async Non-Streaming

​Processing Stream Chunks

​Chat Streaming

​Streaming with Configuration

​Buffering Strategies

​Use Cases

Chatbots

Content Writing

Code Generation

Summarization

​Best Practices

​Error Handling

Build docs developers (and LLMs) love

Synchronous Streaming

Basic Text Streaming

Streaming with Images

Asynchronous Streaming

Basic Async Streaming

Async Non-Streaming

Processing Stream Chunks

Chat Streaming

Streaming with Configuration

Buffering Strategies

Use Cases

Best Practices

Error Handling