Queue system

Every Gradio app comes with a built-in queuing system that can scale to thousands of concurrent users. Because many of your event listeners may involve heavy processing, Gradio automatically creates a queue to handle every event listener in the backend. Every event listener in your app automatically has a queue to process incoming events.

Configuring the queue

By default, each event listener has its own queue, which handles one request at a time. You can configure this via two arguments:

Concurrency limit

The concurrency_limit parameter sets the maximum number of concurrent executions for an event listener. By default, the limit is 1 unless configured otherwise in Blocks.queue(). You can also set it to None for no limit (i.e., an unlimited number of concurrent executions).

import gradio as gr

with gr.Blocks() as demo:
    prompt = gr.Textbox()
    image = gr.Image()
    generate_btn = gr.Button("Generate Image")
    generate_btn.click(image_gen, prompt, image, concurrency_limit=5)

In the code above, up to 5 requests can be processed simultaneously for this event listener. Additional requests will be queued until a slot becomes available.

To ensure unlimited concurrency for an event listener, set concurrency_limit=None. This is useful if your function is calling an external API which handles the rate limiting of requests itself.

Shared queues with concurrency ID

If you want to manage multiple event listeners using a shared queue, you can use the concurrency_id argument. This allows event listeners to share a queue by assigning them the same ID. For example, if your setup has only 2 GPUs but multiple functions require GPU access, you can create a shared queue for all those functions:

import gradio as gr

with gr.Blocks() as demo:
    prompt = gr.Textbox()
    image = gr.Image()
    generate_btn_1 = gr.Button("Generate Image via model 1")
    generate_btn_2 = gr.Button("Generate Image via model 2")
    generate_btn_3 = gr.Button("Generate Image via model 3")
    generate_btn_1.click(image_gen_1, prompt, image, concurrency_limit=2, concurrency_id="gpu_queue")
    generate_btn_2.click(image_gen_2, prompt, image, concurrency_id="gpu_queue")
    generate_btn_3.click(image_gen_3, prompt, image, concurrency_id="gpu_queue")

In this example, all three event listeners share a queue identified by "gpu_queue". The queue can handle up to 2 concurrent requests at a time, as defined by the concurrency_limit.

Default concurrency settings

The default concurrency limit for all queues can be set globally using the default_concurrency_limit parameter in Blocks.queue().

import gradio as gr

with gr.Blocks() as demo:
    # Your components and event listeners here
    pass

demo.queue(default_concurrency_limit=10)
demo.launch()

The queuing system makes it easy to manage the processing behavior of your Gradio app and ensures optimal resource utilization.

Get Started

Core Concepts

Building Interfaces

Building with Blocks

Chatbots

Advanced Features

Custom Components

Clients & Deployment

Configuring the queue

Concurrency limit

Shared queues with concurrency ID

Default concurrency settings

Get Started

Core Concepts

Building Interfaces

Building with Blocks

Chatbots

Advanced Features

Custom Components

Clients & Deployment

​Configuring the queue

​Concurrency limit

​Shared queues with concurrency ID

​Default concurrency settings

Configuring the queue

Concurrency limit

Shared queues with concurrency ID

Default concurrency settings