vllm run-batch command processes batch prompts using vLLM’s OpenAI-compatible API and writes results to a file. It supports both local and HTTP input/output files.
Usage
Basic example
Input format
The input file should be in JSONL format with one request per line, following the OpenAI Batch API format:Output format
Results are written in JSONL format with the response for each request:Options
Required arguments
Input file path (local file or HTTP URL) containing batch requests in JSONL format.
Output file path (local file or HTTP URL) where results will be written in JSONL format.
Model name or path to use for batch processing.
Optional arguments
Enable Prometheus metrics server.
Port for Prometheus metrics server (when
--enable-metrics is enabled).Host address for Prometheus metrics server.
Examples
Local file processing
With metrics enabled
HTTP input/output
Use cases
Offline batch processing
Process large batches of prompts without running a server
Data processing pipelines
Integrate with data pipelines using JSONL files
Benchmarking
Measure throughput on representative workloads
Evaluation
Run model evaluations on test datasets
Related
- Offline inference - Python API for batch processing
- vllm serve - Start an OpenAI-compatible API server
- OpenAI Batch API - OpenAI Batch API format specification