Models
The models endpoint provides information about available models. This endpoint is compatible with OpenAI’s /v1/models API.
List Models
Retrieve a list of all available models.
Request
curl http://localhost:30000/v1/models
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
models = client.models.list()
for model in models.data:
print(f"Model ID: {model.id}")
print(f"Created: {model.created}")
print(f"Owned by: {model.owned_by}")
print()
Response
Array of model objects.Model identifier (e.g., "meta-llama/Llama-3.1-8B-Instruct").
Unix timestamp when the model was added.
Organization that owns the model (always "sglang").
Maximum context length supported by the model.
Example Response
{
"object": "list",
"data": [
{
"id": "meta-llama/Llama-3.1-8B-Instruct",
"object": "model",
"created": 1234567890,
"owned_by": "sglang",
"root": null,
"parent": null,
"max_model_len": 131072
}
]
}
Retrieve Model
Get information about a specific model.
Request
curl http://localhost:30000/v1/models/meta-llama/Llama-3.1-8B-Instruct
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="EMPTY"
)
model = client.models.retrieve("meta-llama/Llama-3.1-8B-Instruct")
print(f"Model: {model.id}")
print(f"Max length: {model.max_model_len}")
Response
Unix timestamp when the model was added.
Organization that owns the model.
Example Response
{
"id": "meta-llama/Llama-3.1-8B-Instruct",
"object": "model",
"created": 1234567890,
"owned_by": "sglang",
"root": null,
"parent": null,
"max_model_len": 131072
}
LoRA Adapters
When using LoRA adapters, you can reference them using the syntax base-model:adapter-name:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")
# Using a LoRA adapter
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct:my-lora-adapter",
messages=[{"role": "user", "content": "Hello!"}]
)
Multi-Model Serving
SGLang supports serving multiple models simultaneously using different methods:
Data Parallelism (DP)
Multiple replicas of the same model for higher throughput:
python -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--dp-size 4
Multiple LoRA Adapters
Serve a base model with multiple LoRA adapters:
python -m sglang.launch_server \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--lora-paths adapter1,adapter2,adapter3
Examples
List All Models
from openai import OpenAI
client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")
models = client.models.list()
print(f"Available models: {len(models.data)}")
for model in models.data:
max_len = model.max_model_len or "Unknown"
print(f"- {model.id} (max context: {max_len})")
Check Model Capabilities
model = client.models.retrieve("meta-llama/Llama-3.1-8B-Instruct")
# Check if model supports long context
if model.max_model_len and model.max_model_len >= 100000:
print(f"{model.id} supports long context ({model.max_model_len} tokens)")
else:
print(f"{model.id} has limited context ({model.max_model_len} tokens)")
Verify Model Before Request
try:
model = client.models.retrieve("meta-llama/Llama-3.1-8B-Instruct")
print(f"Model {model.id} is available")
# Now make a request
response = client.chat.completions.create(
model=model.id,
messages=[{"role": "user", "content": "Hello!"}]
)
except Exception as e:
print(f"Model not available: {e}")
Error Handling
Model Not Found
If you request a model that doesn’t exist:
try:
model = client.models.retrieve("nonexistent-model")
except Exception as e:
print(f"Error: {e}")
# Error: Model 'nonexistent-model' not found
Supported Models
SGLang supports a wide range of models including:
Language Models
- Llama: Llama 2, Llama 3, Llama 3.1, Llama 3.2
- Qwen: Qwen, Qwen2, Qwen2.5
- Mistral: Mistral 7B, Mixtral 8x7B, Mixtral 8x22B
- DeepSeek: DeepSeek V2, DeepSeek V3
- Gemma: Gemma 2B, Gemma 7B, Gemma 2
Vision-Language Models
- Llama 3.2 Vision: 11B, 90B
- Qwen2-VL: 2B, 7B, 72B
- InternVL: 2, 2.5
- LLaVA: 1.5, 1.6, OneVision
Other Models
- Embedding Models: BGE, E5, etc.
- Reasoning Models: GPT-OSS models with reasoning support
For a complete list of supported models, see the supported models documentation.
See Also