Skip to main content
The vllm run-batch command processes batch prompts using vLLM’s OpenAI-compatible API and writes results to a file. It supports both local and HTTP input/output files.

Usage

vllm run-batch -i INPUT.jsonl -o OUTPUT.jsonl --model <model> [OPTIONS]

Basic example

# Run batch processing with local files
vllm run-batch \
  -i prompts.jsonl \
  -o results.jsonl \
  --model meta-llama/Llama-2-7b-hf

Input format

The input file should be in JSONL format with one request per line, following the OpenAI Batch API format:
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Llama-2-7b-hf", "messages": [{"role": "user", "content": "Hello!"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/completions", "body": {"model": "meta-llama/Llama-2-7b-hf", "prompt": "Once upon a time"}}

Output format

Results are written in JSONL format with the response for each request:
{"id": "batch_req_123", "custom_id": "request-1", "response": {"id": "cmpl-123", "object": "chat.completion", "created": 1234567890, "model": "meta-llama/Llama-2-7b-hf", "choices": [...]}}

Options

Required arguments

-i, --input-file
string
required
Input file path (local file or HTTP URL) containing batch requests in JSONL format.
-o, --output-file
string
required
Output file path (local file or HTTP URL) where results will be written in JSONL format.
--model
string
required
Model name or path to use for batch processing.

Optional arguments

--enable-metrics
boolean
default:"false"
Enable Prometheus metrics server.
--port
integer
default:"8000"
Port for Prometheus metrics server (when --enable-metrics is enabled).
--url
string
default:"0.0.0.0"
Host address for Prometheus metrics server.

Examples

Local file processing

vllm run-batch \
  -i /path/to/requests.jsonl \
  -o /path/to/results.jsonl \
  --model facebook/opt-125m

With metrics enabled

vllm run-batch \
  -i prompts.jsonl \
  -o results.jsonl \
  --model meta-llama/Llama-2-7b-hf \
  --enable-metrics \
  --port 9090

HTTP input/output

vllm run-batch \
  -i https://example.com/requests.jsonl \
  -o https://example.com/results.jsonl \
  --model meta-llama/Llama-2-7b-hf

Use cases

Offline batch processing

Process large batches of prompts without running a server

Data processing pipelines

Integrate with data pipelines using JSONL files

Benchmarking

Measure throughput on representative workloads

Evaluation

Run model evaluations on test datasets

Build docs developers (and LLMs) love