Streaming inputs and outputs

Gradio supports streaming for both outputs and inputs, allowing you to create real-time, interactive applications.

Streaming outputs

In some cases, you may want to stream a sequence of outputs rather than show a single output at once. For example, you might have an image generation model and you want to show the image that is generated at each step, leading up to the final image. Or you might have a chatbot which streams its response one token at a time instead of returning it all at once. To stream outputs, supply a generator function into Gradio instead of a regular function. Creating generators in Python is very simple: instead of a single return value, a function should yield a series of values instead. Usually the yield statement is put in some kind of loop. Here’s an example of a generator that simply counts up to a given number:

def my_generator(x):
    for i in range(x):
        yield i

You supply a generator into Gradio the same way as you would a regular function. For example, here’s a (fake) image generation model that generates noise for several steps before outputting an image:

import gradio as gr
import numpy as np
import time

def fake_diffusion(steps):
    for i in range(steps):
        time.sleep(1)
        image = np.random.random((600, 600, 3))
        yield image
    image = "https://gradio-builds.s3.amazonaws.com/demo-files/base.png"
    yield image

demo = gr.Interface(fake_diffusion, inputs=gr.Slider(1, 10, 3), outputs="image")
demo.launch()

The time.sleep(1) creates an artificial pause between steps so that you are able to observe the steps of the iterator (in a real image generation model, this probably wouldn’t be necessary).

Streaming media outputs

Gradio can stream audio and video directly from your generator function. This lets your user hear your audio or see your video nearly as soon as it’s yielded by your function. To enable media streaming:

Set streaming=True in your gr.Audio or gr.Video output component
Write a Python generator that yields the next “chunk” of audio or video
Set autoplay=True so that the media starts playing automatically

For audio, the next “chunk” can be either an .mp3 or .wav file or a bytes sequence of audio. For video, the next “chunk” has to be either a .mp4 file or a file with h.264 codec with a .ts extension.

For smooth playback, make sure chunks are consistent lengths and larger than 1 second.

Streaming audio example

import gradio as gr
from time import sleep

def keep_repeating(audio_file):
    for _ in range(10):
        sleep(0.5)
        yield audio_file

gr.Interface(
    keep_repeating,
    gr.Audio(sources=["microphone"], type="filepath"),
    gr.Audio(streaming=True, autoplay=True)
).launch()

Streaming video example

import gradio as gr
from time import sleep

def keep_repeating(video_file):
    for _ in range(10):
        sleep(0.5)
        yield video_file

gr.Interface(
    keep_repeating,
    gr.Video(sources=["webcam"], format="mp4"),
    gr.Video(streaming=True, autoplay=True)
).launch()

Streaming inputs

Gradio also allows you to stream images from a user’s camera or audio chunks from their microphone into your event handler. This can be used to create real-time object detection apps or conversational chat applications. Currently, the gr.Image and the gr.Audio components support input streaming via the stream event. Here’s the simplest streaming app possible, which simply returns the webcam stream unmodified:

import gradio as gr

def stream_frames(image):
    return image

with gr.Blocks() as demo:
    input_img = gr.Image(sources=["webcam"], type="numpy")
    output_img = gr.Image(streaming=True)
    input_img.stream(stream_frames, input_img, output_img)

demo.launch()

Try it out! The streaming event is triggered when the user starts recording. Under the hood, the webcam will take a photo every 0.1 seconds and send it to the server.

Stream event parameters

There are two unique keyword arguments for the stream event:

time_limit: The amount of time the Gradio server will spend processing the event. Media streams are naturally unbounded so it’s important to set a time limit so that one user does not hog the Gradio queue. The time limit only counts the time spent processing the stream, not the time spent waiting in the queue. The orange bar displayed at the bottom of the input image represents the remaining time. When the time limit expires, the user will automatically rejoin the queue.
stream_every: The frequency (in seconds) with which the stream will capture input and send it to the server. For demos like image detection or manipulation, setting a smaller value is desired to get a “real-time” effect. For demos like speech transcription, a higher value is useful so that the transcription algorithm has more context of what’s being said.

Real-time image filters

Here’s a demo where a user can choose a filter to apply to their webcam stream:

import gradio as gr
import numpy as np
import cv2

def transform_cv2(frame, transform):
    if transform == "cartoon":
        # prepare color
        img_color = cv2.pyrDown(cv2.pyrDown(frame))
        for _ in range(6):
            img_color = cv2.bilateralFilter(img_color, 9, 9, 7)
        img_color = cv2.pyrUp(cv2.pyrUp(img_color))

        # prepare edges
        img_edges = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
        img_edges = cv2.adaptiveThreshold(
            cv2.medianBlur(img_edges, 7),
            255,
            cv2.ADAPTIVE_THRESH_MEAN_C,
            cv2.THRESH_BINARY,
            9,
            2,
        )
        img_edges = cv2.cvtColor(img_edges, cv2.COLOR_GRAY2RGB)
        # combine color and edges
        img = cv2.bitwise_and(img_color, img_edges)
        return img
    elif transform == "edges":
        # perform edge detection
        img = cv2.cvtColor(cv2.Canny(frame, 100, 200), cv2.COLOR_GRAY2BGR)
        return img
    else:
        return np.flipud(frame)

with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            transform = gr.Dropdown(
                choices=["cartoon", "edges", "flip"],
                value="flip",
                label="Transformation",
            )
            input_img = gr.Image(sources=["webcam"], type="numpy")
        with gr.Column():
            output_img = gr.Image(streaming=True)
        input_img.stream(
            transform_cv2,
            [input_img, transform],
            [output_img],
            time_limit=30,
            stream_every=0.1,
        )

demo.launch()

If you change the filter value, it will immediately take effect in the output stream. This is an important difference of stream events compared to other Gradio events - the input values of the stream can be changed while the stream is being processed.

Unified streaming components

For some image streaming demos, you don’t need to display separate input and output components. Your app would look cleaner if you could just display the modified output stream. You can do this by specifying the input image component as the output of the stream event:

import gradio as gr
import numpy as np

def apply_filter(image, transform):
    # Apply your filter here
    return np.flipud(image)

with gr.Blocks() as demo:
    transform = gr.Dropdown(["flip", "edges"], value="flip")
    img = gr.Image(sources=["webcam"], type="numpy", streaming=True)
    img.stream(apply_filter, [img, transform], img)

demo.launch()

Maintaining state with streaming

Your streaming function should be stateless - it should take the current input and return its corresponding output. However, there are cases where you may want to keep track of past inputs or outputs. For example, you may want to keep a buffer of the previous k inputs to improve the accuracy of your transcription demo. You can do this with Gradio’s gr.State() component:

import gradio as gr

def transcribe_handler(current_audio, state, transcript):
    next_text = transcribe(current_audio, history=state)
    state.append(current_audio)
    state = state[-3:]  # Keep only last 3
    return state, transcript + next_text

with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            mic = gr.Audio(sources="microphone")
            state = gr.State(value=[])
        with gr.Column():
            transcript = gr.Textbox(label="Transcript")
    mic.stream(
        transcribe_handler,
        [mic, state, transcript],
        [state, transcript],
        time_limit=10,
        stream_every=1
    )

demo.launch()

Get Started

Core Concepts

Building Interfaces

Building with Blocks

Chatbots

Advanced Features

Custom Components

Clients & Deployment

Streaming inputs and outputs

Streaming outputs

Streaming media outputs

Streaming audio example

Streaming video example

Streaming inputs

Stream event parameters

Real-time image filters

Unified streaming components

Maintaining state with streaming

Get Started

Core Concepts

Building Interfaces

Building with Blocks

Chatbots

Advanced Features

Custom Components

Clients & Deployment

​Streaming outputs

​Streaming media outputs

​Streaming audio example

​Streaming video example

​Streaming inputs

​Stream event parameters

​Real-time image filters

​Unified streaming components

​Maintaining state with streaming

Streaming outputs

Streaming media outputs

Streaming audio example

Streaming video example

Streaming inputs

Stream event parameters

Real-time image filters

Unified streaming components

Maintaining state with streaming