Streaming outputs
In some cases, you may want to stream a sequence of outputs rather than show a single output at once. For example, you might have an image generation model and you want to show the image that is generated at each step, leading up to the final image. Or you might have a chatbot which streams its response one token at a time instead of returning it all at once. To stream outputs, supply a generator function into Gradio instead of a regular function. Creating generators in Python is very simple: instead of a singlereturn value, a function should yield a series of values instead. Usually the yield statement is put in some kind of loop.
Here’s an example of a generator that simply counts up to a given number:
The
time.sleep(1) creates an artificial pause between steps so that you are able to observe the steps of the iterator (in a real image generation model, this probably wouldn’t be necessary).Streaming media outputs
Gradio can stream audio and video directly from your generator function. This lets your user hear your audio or see your video nearly as soon as it’syielded by your function.
To enable media streaming:
- Set
streaming=Truein yourgr.Audioorgr.Videooutput component - Write a Python generator that yields the next “chunk” of audio or video
- Set
autoplay=Trueso that the media starts playing automatically
.mp3 or .wav file or a bytes sequence of audio. For video, the next “chunk” has to be either a .mp4 file or a file with h.264 codec with a .ts extension.
Streaming audio example
Streaming video example
Streaming inputs
Gradio also allows you to stream images from a user’s camera or audio chunks from their microphone into your event handler. This can be used to create real-time object detection apps or conversational chat applications. Currently, thegr.Image and the gr.Audio components support input streaming via the stream event.
Here’s the simplest streaming app possible, which simply returns the webcam stream unmodified:
Stream event parameters
There are two unique keyword arguments for thestream event:
-
time_limit: The amount of time the Gradio server will spend processing the event. Media streams are naturally unbounded so it’s important to set a time limit so that one user does not hog the Gradio queue. The time limit only counts the time spent processing the stream, not the time spent waiting in the queue. The orange bar displayed at the bottom of the input image represents the remaining time. When the time limit expires, the user will automatically rejoin the queue. -
stream_every: The frequency (in seconds) with which the stream will capture input and send it to the server. For demos like image detection or manipulation, setting a smaller value is desired to get a “real-time” effect. For demos like speech transcription, a higher value is useful so that the transcription algorithm has more context of what’s being said.
Real-time image filters
Here’s a demo where a user can choose a filter to apply to their webcam stream:If you change the filter value, it will immediately take effect in the output stream. This is an important difference of stream events compared to other Gradio events - the input values of the stream can be changed while the stream is being processed.
Unified streaming components
For some image streaming demos, you don’t need to display separate input and output components. Your app would look cleaner if you could just display the modified output stream. You can do this by specifying the input image component as the output of the stream event:Maintaining state with streaming
Your streaming function should be stateless - it should take the current input and return its corresponding output. However, there are cases where you may want to keep track of past inputs or outputs. For example, you may want to keep a buffer of the previousk inputs to improve the accuracy of your transcription demo.
You can do this with Gradio’s gr.State() component: