Skip to main content
Streaming allows the model to send responses back incrementally as they’re generated, rather than waiting for the complete response. This is ideal for better user experience in interactive applications.

Synchronous Streaming

Basic Text Streaming

The generate_content_stream method returns an iterator that yields chunks as they arrive:
from google import genai

client = genai.Client(api_key='your-api-key')

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Tell me a story in 300 words.'
):
    print(chunk.text, end='')
Notice the end='' parameter in print() - this prevents adding newlines between chunks and creates a smooth streaming effect.

Streaming with Images

You can stream responses for multimodal inputs:
Stream responses for images stored in Google Cloud Storage:
from google.genai import types

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents=[
        'What is this image about?',
        types.Part.from_uri(
            file_uri='gs://generativeai-downloads/images/scones.jpg',
            mime_type='image/jpeg',
        ),
    ],
):
    print(chunk.text, end='')

Asynchronous Streaming

For async applications, use the aio client to stream responses asynchronously:

Basic Async Streaming

import asyncio
from google import genai

client = genai.Client(api_key='your-api-key')

async def stream_content():
    async for chunk in await client.aio.models.generate_content_stream(
        model='gemini-2.5-flash',
        contents='Tell me a story in 300 words.'
    ):
        print(chunk.text, end='')

asyncio.run(stream_content())

Async Non-Streaming

You can also use async without streaming:
import asyncio

async def generate_content():
    response = await client.aio.models.generate_content(
        model='gemini-2.5-flash',
        contents='Tell me a story in 300 words.'
    )
    print(response.text)

asyncio.run(generate_content())

Processing Stream Chunks

Each chunk in the stream has the same structure as a complete response:
for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Explain quantum computing'
):
    # Access text directly
    print(chunk.text, end='')

    # Or access parts
    for part in chunk.parts:
        if part.text:
            print(part.text, end='')

    # Access usage metadata (available on final chunk)
    if hasattr(chunk, 'usage_metadata'):
        print(f"\nTokens used: {chunk.usage_metadata.total_token_count}")

Chat Streaming

Streaming works seamlessly with chat sessions:
chat = client.chats.create(model='gemini-2.5-flash')

for chunk in chat.send_message_stream('tell me a story'):
    print(chunk.text, end='')

print("\n---")

for chunk in chat.send_message_stream('summarize it in one sentence'):
    print(chunk.text, end='')

Streaming with Configuration

You can apply all standard configuration options to streaming:
from google.genai import types

for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Write a poem about coding',
    config=types.GenerateContentConfig(
        temperature=0.9,
        max_output_tokens=500,
        system_instruction='You are a creative poet who writes in rhyme.',
    ),
):
    print(chunk.text, end='')

Buffering Strategies

Different strategies for handling streamed content:
Display each chunk immediately (best for chatbots):
for chunk in client.models.generate_content_stream(
    model='gemini-2.5-flash',
    contents='Tell me a story'
):
    print(chunk.text, end='', flush=True)

Use Cases

Chatbots

Real-time responses for better user engagement

Content Writing

Show progress for long-form content generation

Code Generation

Display code as it’s being generated

Summarization

Progressive summaries for long documents

Best Practices

  • Use streaming for responses that take more than 2-3 seconds
  • Add flush=True to print() for immediate display in terminals
  • Use async streaming in web applications for better concurrency
  • Buffer content if you need to process the complete response
  • Handle network interruptions gracefully with try-except blocks
  • Consider user experience - streaming improves perceived latency

Error Handling

try:
    for chunk in client.models.generate_content_stream(
        model='gemini-2.5-flash',
        contents='Tell me a story'
    ):
        print(chunk.text, end='', flush=True)
except Exception as e:
    print(f"\n\nStreaming error: {e}")

Build docs developers (and LLMs) love