Skip to main content
The models endpoint lists all models currently available on the vLLM server.

List models

GET /v1/models
Lists all available models.

Response format

object
string
Always “list”.
data
array
Array of model objects.
id
string
The model identifier that can be referenced in API requests.
object
string
Always “model”.
created
integer
Unix timestamp of when the model was loaded.
owned_by
string
Organization that owns the model (typically “vllm”).
root
string
The base model identifier.
parent
string | null
Parent model, if any.

Example

curl http://localhost:8000/v1/models
{
  "object": "list",
  "data": [
    {
      "id": "facebook/opt-125m",
      "object": "model",
      "created": 1677610602,
      "owned_by": "vllm",
      "root": "facebook/opt-125m",
      "parent": null
    }
  ]
}

Retrieve model

GET /v1/models/{model}
Retrieves information about a specific model.

Path parameters

model
string
required
The model ID to retrieve.

Response format

id
string
The model identifier.
object
string
Always “model”.
created
integer
Unix timestamp of when the model was loaded.
owned_by
string
Organization that owns the model.

Example

curl http://localhost:8000/v1/models/facebook/opt-125m
{
  "id": "facebook/opt-125m",
  "object": "model",
  "created": 1677610602,
  "owned_by": "vllm"
}

Using with Python

import requests

# List all models
response = requests.get("http://localhost:8000/v1/models")
models = response.json()["data"]

for model in models:
    print(f"Model: {model['id']}")

# Get specific model
model_id = "facebook/opt-125m"
response = requests.get(f"http://localhost:8000/v1/models/{model_id}")
model_info = response.json()
print(model_info)

Using with OpenAI Python client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
)

# List models
models = client.models.list()
for model in models.data:
    print(model.id)

# Retrieve model
model = client.models.retrieve("facebook/opt-125m")
print(model)

Multiple model serving

To serve multiple models from the same vLLM instance, use the --served-model-name flag:
vllm serve facebook/opt-125m \
  --served-model-name opt-125m gpt-125m
This makes the model available under multiple names:
curl http://localhost:8000/v1/models
{
  "object": "list",
  "data": [
    {
      "id": "opt-125m",
      "object": "model",
      "created": 1677610602,
      "owned_by": "vllm"
    },
    {
      "id": "gpt-125m",
      "object": "model",
      "created": 1677610602,
      "owned_by": "vllm"
    }
  ]
}

Build docs developers (and LLMs) love