vllm run-batch

The vllm run-batch command processes batch prompts using vLLM’s OpenAI-compatible API and writes results to a file. It supports both local and HTTP input/output files.

Usage

vllm run-batch -i INPUT.jsonl -o OUTPUT.jsonl --model <model> [OPTIONS]

Basic example

# Run batch processing with local files
vllm run-batch \
  -i prompts.jsonl \
  -o results.jsonl \
  --model meta-llama/Llama-2-7b-hf

Input format

The input file should be in JSONL format with one request per line, following the OpenAI Batch API format:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Llama-2-7b-hf", "messages": [{"role": "user", "content": "Hello!"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/completions", "body": {"model": "meta-llama/Llama-2-7b-hf", "prompt": "Once upon a time"}}

Output format

Results are written in JSONL format with the response for each request:

{"id": "batch_req_123", "custom_id": "request-1", "response": {"id": "cmpl-123", "object": "chat.completion", "created": 1234567890, "model": "meta-llama/Llama-2-7b-hf", "choices": [...]}}

Options

Required arguments

-i, --input-file

string

required

Input file path (local file or HTTP URL) containing batch requests in JSONL format.

-o, --output-file

string

required

Output file path (local file or HTTP URL) where results will be written in JSONL format.

--model

string

required

Model name or path to use for batch processing.

Optional arguments

--enable-metrics

boolean

default:"false"

Enable Prometheus metrics server.

--port

integer

default:"8000"

Port for Prometheus metrics server (when --enable-metrics is enabled).

--url

string

default:"0.0.0.0"

Host address for Prometheus metrics server.

Examples

Local file processing

vllm run-batch \
  -i /path/to/requests.jsonl \
  -o /path/to/results.jsonl \
  --model facebook/opt-125m

With metrics enabled

vllm run-batch \
  -i prompts.jsonl \
  -o results.jsonl \
  --model meta-llama/Llama-2-7b-hf \
  --enable-metrics \
  --port 9090

HTTP input/output

vllm run-batch \
  -i https://example.com/requests.jsonl \
  -o https://example.com/results.jsonl \
  --model meta-llama/Llama-2-7b-hf

Use cases

Offline batch processing

Process large batches of prompts without running a server

Data processing pipelines

Integrate with data pipelines using JSONL files

Benchmarking

Measure throughput on representative workloads

Evaluation

Run model evaluations on test datasets

Offline inference - Python API for batch processing
vllm serve - Start an OpenAI-compatible API server
OpenAI Batch API - OpenAI Batch API format specification

Python API

REST API

CLI Reference

vllm run-batch

Usage

Basic example

Input format

Output format

Options

Required arguments

Optional arguments

Examples

Local file processing

With metrics enabled

HTTP input/output

Use cases

Offline batch processing

Data processing pipelines

Benchmarking

Evaluation

Build docs developers (and LLMs) love

Python API

REST API

CLI Reference

​Usage

​Basic example

​Input format

​Output format

​Options

​Required arguments

​Optional arguments

​Examples

​Local file processing

​With metrics enabled

​HTTP input/output

​Use cases

Offline batch processing

Data processing pipelines

Benchmarking

Evaluation

​Related

Build docs developers (and LLMs) love

Usage

Basic example

Input format

Output format

Options

Required arguments

Optional arguments

Examples

Local file processing

With metrics enabled

HTTP input/output

Use cases

Related