The models endpoint lists all models currently available on the vLLM server.
List models
Lists all available models.
Array of model objects.The model identifier that can be referenced in API requests.
Unix timestamp of when the model was loaded.
Organization that owns the model (typically “vllm”).
The base model identifier.
Example
curl http://localhost:8000/v1/models
{
"object": "list",
"data": [
{
"id": "facebook/opt-125m",
"object": "model",
"created": 1677610602,
"owned_by": "vllm",
"root": "facebook/opt-125m",
"parent": null
}
]
}
Retrieve model
Retrieves information about a specific model.
Path parameters
The model ID to retrieve.
Unix timestamp of when the model was loaded.
Organization that owns the model.
Example
curl http://localhost:8000/v1/models/facebook/opt-125m
{
"id": "facebook/opt-125m",
"object": "model",
"created": 1677610602,
"owned_by": "vllm"
}
Using with Python
import requests
# List all models
response = requests.get("http://localhost:8000/v1/models")
models = response.json()["data"]
for model in models:
print(f"Model: {model['id']}")
# Get specific model
model_id = "facebook/opt-125m"
response = requests.get(f"http://localhost:8000/v1/models/{model_id}")
model_info = response.json()
print(model_info)
Using with OpenAI Python client
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
)
# List models
models = client.models.list()
for model in models.data:
print(model.id)
# Retrieve model
model = client.models.retrieve("facebook/opt-125m")
print(model)
Multiple model serving
To serve multiple models from the same vLLM instance, use the --served-model-name flag:
vllm serve facebook/opt-125m \
--served-model-name opt-125m gpt-125m
This makes the model available under multiple names:
curl http://localhost:8000/v1/models
{
"object": "list",
"data": [
{
"id": "opt-125m",
"object": "model",
"created": 1677610602,
"owned_by": "vllm"
},
{
"id": "gpt-125m",
"object": "model",
"created": 1677610602,
"owned_by": "vllm"
}
]
}