Server configuration arguments

The vllm serve command launches an OpenAI-compatible API server. These arguments control server behavior, authentication, CORS, SSL, and other HTTP server settings.

Server arguments are separate from engine arguments. The server arguments configure the API server wrapper, while engine arguments configure the underlying inference engine.

Configuration methods

You can configure the server using:

Command line arguments
YAML configuration file

YAML configuration file

You can load arguments from a YAML config file:

# config.yaml
model: meta-llama/Llama-3.1-8B-Instruct
host: "127.0.0.1"
port: 8000
tensor-parallel-size: 2
uvicorn-log-level: "info"
enable-log-requests: true

Use the config file:

vllm serve --config config.yaml

Precedence order: command line > config file > defaultsIf an argument appears in both command line and config file, the command line value takes precedence.

Server arguments

Basic server settings

host

str

default:"None"

Host IP address to bind the server to.

vllm serve MODEL --host 0.0.0.0

port

int

default:"8000"

Port number to run the server on.

vllm serve MODEL --port 8080

uds

str

default:"None"

Unix domain socket path. If set, host and port arguments are ignored.

vllm serve MODEL --uds /tmp/vllm.sock

uvicorn_log_level

str

default:"info"

Log level for uvicorn server.Options: critical, error, warning, info, debug, trace

disable_uvicorn_access_log

bool

default:"false"

Disable uvicorn access logging.

disable_access_log_for_endpoints

str

default:"None"

Comma-separated list of endpoint paths to exclude from access logs.Useful to reduce log noise from health checks:

--disable-access-log-for-endpoints "/health,/metrics,/ping"

Authentication

api_key

list[str]

default:"None"

API keys required in the request header.

vllm serve MODEL --api-key sk-key1 --api-key sk-key2

Clients must include the key in requests:

curl -H "Authorization: Bearer sk-key1" http://localhost:8000/v1/completions

CORS configuration

allow_credentials

bool

default:"false"

Allow credentials in CORS requests.

allowed_origins

list[str]

default:"['*']"

Allowed origins for CORS.

--allowed-origins '["https://example.com", "https://app.example.com"]'

allowed_methods

list[str]

default:"['*']"

Allowed HTTP methods for CORS.

allowed_headers

list[str]

default:"['*']"

Allowed headers for CORS.

SSL/TLS configuration

ssl_keyfile

str

default:"None"

Path to SSL key file for HTTPS.

ssl_certfile

str

default:"None"

Path to SSL certificate file for HTTPS.

ssl_ca_certs

str

default:"None"

Path to CA certificates file.

enable_ssl_refresh

bool

default:"false"

Automatically refresh SSL context when certificate files change.

ssl_cert_reqs

int

default:"0"

Whether client certificate is required (see Python ssl module).

Chat template configuration

chat_template

str

default:"None"

Path to chat template file or template string.

vllm serve MODEL --chat-template /path/to/template.jinja

trust_request_chat_template

bool

default:"false"

Trust chat templates provided in requests.

Only enable if you trust all API clients, as templates can execute arbitrary code.

response_role

str

default:"assistant"

The role name to return when add_generation_prompt=true.

LoRA configuration

lora_modules

list[LoRAModulePath]

default:"None"

LoRA modules to load at startup.Old format:

--lora-modules name1=path1 name2=path2

New JSON format:

--lora-modules '{"name": "adapter1", "path": "/path/to/lora"}'

Tool calling

enable_auto_tool_choice

bool

default:"false"

Enable automatic tool choice for supported models.Requires --tool-call-parser to be specified.

tool_call_parser

str

default:"None"

Tool call parser for the model.Built-in parsers: hermes, mistral, internlm, llama3_json

vllm serve MODEL --enable-auto-tool-choice --tool-call-parser hermes

tool_parser_plugin

str

default:""

Plugin for custom tool parser.

Logging configuration

max_log_len

int

default:"None"

Maximum number of prompt characters or prompt ID numbers to print in logs.None means unlimited.

enable_log_outputs

bool

default:"false"

Log model outputs (generations).Requires --enable-log-requests.

log_error_stack

bool

default:"false"

Log stack trace of error responses.

Advanced server settings

disable_frontend_multiprocessing

bool

default:"false"

Run the API server in the same process as the model serving engine.

root_path

str

default:"None"

FastAPI root_path when app is behind a path-based routing proxy.

middleware

list[str]

default:"[]"

Additional ASGI middleware to apply.

--middleware my_package.middleware.MyMiddleware

enable_request_id_headers

bool

default:"false"

Add X-Request-Id header to responses.

disable_fastapi_docs

bool

default:"false"

Disable FastAPI’s OpenAPI schema, Swagger UI, and ReDoc endpoints.

h11_max_incomplete_event_size

int

default:"4194304"

Maximum size (bytes) of incomplete HTTP event for h11 parser.Default: 4 MB. Helps mitigate header abuse.

h11_max_header_count

int

default:"256"

Maximum number of HTTP headers allowed.Helps mitigate header abuse.

Data parallel settings

headless

bool

default:"false"

Run in headless mode for multi-node data parallelism.

api_server_count

int

default:"None"

Number of API server processes to run.Defaults to data_parallel_size if not specified.

vllm serve MODEL --api-server-count 4

Usage examples

Basic server

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 \
  --port 8000

Server with authentication

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --api-key sk-secret-key-123

HTTPS server

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --ssl-keyfile /path/to/key.pem \
  --ssl-certfile /path/to/cert.pem

Server with custom chat template

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --chat-template /path/to/template.jinja

Server with tool calling

vllm serve NousResearch/Hermes-3-Llama-3.1-8B \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

Multi-API server deployment

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --api-server-count 4 \
  --data-parallel-size 2

Get Started

Core Concepts

Serving

Models

Features

Configuration

Deployment

Server configuration arguments

Configuration methods

YAML configuration file

Server arguments

Basic server settings

Authentication

CORS configuration

SSL/TLS configuration

Chat template configuration

LoRA configuration

Tool calling

Logging configuration

Advanced server settings

Data parallel settings

Usage examples

Basic server

Server with authentication

HTTPS server

Server with custom chat template

Server with tool calling

Multi-API server deployment

See also

Build docs developers (and LLMs) love

Get Started

Core Concepts

Serving

Models

Features

Configuration

Deployment

​Configuration methods

​YAML configuration file

​Server arguments

​Basic server settings

​Authentication

​CORS configuration

​SSL/TLS configuration

​Chat template configuration

​LoRA configuration

​Tool calling

​Logging configuration

​Advanced server settings

​Data parallel settings

​Usage examples

​Basic server

​Server with authentication

​HTTPS server

​Server with custom chat template

​Server with tool calling

​Multi-API server deployment

​See also

Build docs developers (and LLMs) love

Configuration methods

YAML configuration file

Server arguments

Basic server settings

Authentication

CORS configuration

SSL/TLS configuration

Chat template configuration

LoRA configuration

Tool calling

Logging configuration

Advanced server settings

Data parallel settings

Usage examples

Basic server

Server with authentication

HTTPS server

Server with custom chat template

Server with tool calling

Multi-API server deployment

See also