Skip to main content
Gradio supports the ability to pass batch functions, which are functions that take in a list of inputs and return a list of predictions. This can significantly improve performance when processing multiple requests.

What are batch functions?

Batch functions are functions which take in a list of inputs and return a list of predictions. Instead of processing requests one at a time, batch functions allow you to process multiple requests simultaneously. For example, here’s a batched function that takes in two lists of inputs (a list of words and a list of ints), and returns a list of trimmed words as output:
import time

def trim_words(words, lens):
    trimmed_words = []
    time.sleep(5)
    for w, l in zip(words, lens):
        trimmed_words.append(w[:int(l)])
    return [trimmed_words]

Why use batch functions?

The advantage of using batched functions is that if you enable queuing, the Gradio server can automatically batch incoming requests and process them in parallel, potentially speeding up your demo. In the example above, 16 requests could be processed in parallel (for a total inference time of 5 seconds), instead of each request being processed separately (for a total inference time of 80 seconds).

Using batch functions with Interface

With the gr.Interface class, you can enable batching by setting batch=True and specifying a max_batch_size:
import gradio as gr

def trim_words(words, lens):
    trimmed_words = []
    for w, l in zip(words, lens):
        trimmed_words.append(w[:int(l)])
    return [trimmed_words]

demo = gr.Interface(
    fn=trim_words,
    inputs=["textbox", "number"],
    outputs=["textbox"],
    batch=True,
    max_batch_size=16
)

demo.launch()

Using batch functions with Blocks

With the gr.Blocks class, you can specify batch parameters in the event listener:
import gradio as gr

def trim_words(words, lens):
    trimmed_words = []
    for w, l in zip(words, lens):
        trimmed_words.append(w[:int(l)])
    return [trimmed_words]

with gr.Blocks() as demo:
    with gr.Row():
        word = gr.Textbox(label="Word")
        length = gr.Number(label="Length")
        output = gr.Textbox(label="Output")
    with gr.Row():
        run = gr.Button("Process")

    event = run.click(
        trim_words,
        [word, length],
        output,
        batch=True,
        max_batch_size=16
    )

demo.launch()

Batch processing with Hugging Face models

Many Hugging Face transformers and diffusers models work very naturally with Gradio’s batch mode. Here’s an example using a diffusion model to generate images in batches:
import torch
from diffusers import DiffusionPipeline
import gradio as gr

generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")
if torch.cuda.is_available():
    generator = generator.to("cuda")

def generate(prompts):
    images = generator(list(prompts)).images
    return [images]

demo = gr.Interface(
    generate,
    "textbox",
    "image",
    batch=True,
    max_batch_size=4  # Set based on your CPU/GPU memory
)

demo.launch()
Set the max_batch_size based on your available CPU/GPU memory. Larger batch sizes can improve throughput but require more memory.

How batching works

When batching is enabled:
  1. Gradio collects incoming requests in a queue
  2. When a request is ready to be processed, Gradio checks if there are other pending requests
  3. If there are pending requests, Gradio batches them together (up to max_batch_size)
  4. The batch is sent to your function as lists of inputs
  5. Your function processes the entire batch and returns a list of outputs
  6. Gradio distributes the outputs back to the individual requests

Important considerations

Your function must be able to handle lists of inputs and return lists of outputs when batching is enabled. Make sure your function signature matches this pattern.
  • The function receives each input parameter as a list
  • The function must return outputs as a list (or list of lists for multiple outputs)
  • The length of each output list should match the length of the input lists
  • Batching works best with functions that can process multiple inputs more efficiently than processing them one at a time
Batch processing is particularly effective for GPU-accelerated models, where the overhead of transferring data to the GPU can be amortized across multiple inputs.