What are batch functions?
Batch functions are functions which take in a list of inputs and return a list of predictions. Instead of processing requests one at a time, batch functions allow you to process multiple requests simultaneously. For example, here’s a batched function that takes in two lists of inputs (a list of words and a list of ints), and returns a list of trimmed words as output:Why use batch functions?
The advantage of using batched functions is that if you enable queuing, the Gradio server can automatically batch incoming requests and process them in parallel, potentially speeding up your demo. In the example above, 16 requests could be processed in parallel (for a total inference time of 5 seconds), instead of each request being processed separately (for a total inference time of 80 seconds).Using batch functions with Interface
With thegr.Interface class, you can enable batching by setting batch=True and specifying a max_batch_size:
Using batch functions with Blocks
With thegr.Blocks class, you can specify batch parameters in the event listener:
Batch processing with Hugging Face models
Many Hugging Facetransformers and diffusers models work very naturally with Gradio’s batch mode. Here’s an example using a diffusion model to generate images in batches:
How batching works
When batching is enabled:- Gradio collects incoming requests in a queue
- When a request is ready to be processed, Gradio checks if there are other pending requests
- If there are pending requests, Gradio batches them together (up to
max_batch_size) - The batch is sent to your function as lists of inputs
- Your function processes the entire batch and returns a list of outputs
- Gradio distributes the outputs back to the individual requests
Important considerations
- The function receives each input parameter as a list
- The function must return outputs as a list (or list of lists for multiple outputs)
- The length of each output list should match the length of the input lists
- Batching works best with functions that can process multiple inputs more efficiently than processing them one at a time
Batch processing is particularly effective for GPU-accelerated models, where the overhead of transferring data to the GPU can be amortized across multiple inputs.