Models API

The models endpoint lists all models currently available on the vLLM server.

List models

GET /v1/models

Lists all available models.

Response format

object

string

Always “list”.

data

array

Array of model objects.

string

The model identifier that can be referenced in API requests.

object

string

Always “model”.

created

integer

Unix timestamp of when the model was loaded.

owned_by

string

Organization that owns the model (typically “vllm”).

root

string

The base model identifier.

parent

string | null

Parent model, if any.

Example

curl http://localhost:8000/v1/models

{
  "object": "list",
  "data": [
    {
      "id": "facebook/opt-125m",
      "object": "model",
      "created": 1677610602,
      "owned_by": "vllm",
      "root": "facebook/opt-125m",
      "parent": null
    }
  ]
}

Retrieve model

GET /v1/models/{model}

Retrieves information about a specific model.

Path parameters

model

string

required

The model ID to retrieve.

Response format

string

The model identifier.

object

string

Always “model”.

created

integer

Unix timestamp of when the model was loaded.

owned_by

string

Organization that owns the model.

Example

curl http://localhost:8000/v1/models/facebook/opt-125m

{
  "id": "facebook/opt-125m",
  "object": "model",
  "created": 1677610602,
  "owned_by": "vllm"
}

Using with Python

import requests

# List all models
response = requests.get("http://localhost:8000/v1/models")
models = response.json()["data"]

for model in models:
    print(f"Model: {model['id']}")

# Get specific model
model_id = "facebook/opt-125m"
response = requests.get(f"http://localhost:8000/v1/models/{model_id}")
model_info = response.json()
print(model_info)

Using with OpenAI Python client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
)

# List models
models = client.models.list()
for model in models.data:
    print(model.id)

# Retrieve model
model = client.models.retrieve("facebook/opt-125m")
print(model)

Multiple model serving

To serve multiple models from the same vLLM instance, use the --served-model-name flag:

vllm serve facebook/opt-125m \
  --served-model-name opt-125m gpt-125m

This makes the model available under multiple names:

curl http://localhost:8000/v1/models

{
  "object": "list",
  "data": [
    {
      "id": "opt-125m",
      "object": "model",
      "created": 1677610602,
      "owned_by": "vllm"
    },
    {
      "id": "gpt-125m",
      "object": "model",
      "created": 1677610602,
      "owned_by": "vllm"
    }
  ]
}

Completions endpoint - Use model for text generation
Chat completions endpoint - Use model for chat
Embeddings endpoint - Use model for embeddings

Python API

REST API

CLI Reference

List models

Response format

Example

Retrieve model

Path parameters

Response format

Example

Using with Python

Using with OpenAI Python client

Multiple model serving

Build docs developers (and LLMs) love

Python API

REST API

CLI Reference

​List models

​Response format

​Example

​Retrieve model

​Path parameters

​Response format

​Example

​Using with Python

​Using with OpenAI Python client

​Multiple model serving

​Related

Build docs developers (and LLMs) love

List models

Response format

Example

Retrieve model

Path parameters

Response format

Example

Using with Python

Using with OpenAI Python client

Multiple model serving

Related