Models

The models endpoint provides information about available models. This endpoint is compatible with OpenAI’s /v1/models API.

List Models

Retrieve a list of all available models.

Request

curl http://localhost:30000/v1/models

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="EMPTY"
)

models = client.models.list()
for model in models.data:
    print(f"Model ID: {model.id}")
    print(f"Created: {model.created}")
    print(f"Owned by: {model.owned_by}")
    print()

Response

object

string

Always "list".

data

array

Array of model objects.

string

Model identifier (e.g., "meta-llama/Llama-3.1-8B-Instruct").

object

string

Always "model".

created

integer

Unix timestamp when the model was added.

owned_by

string

Organization that owns the model (always "sglang").

root

string | null

Root model identifier.

parent

string | null

Parent model identifier.

max_model_len

integer | null

Maximum context length supported by the model.

Example Response

{
  "object": "list",
  "data": [
    {
      "id": "meta-llama/Llama-3.1-8B-Instruct",
      "object": "model",
      "created": 1234567890,
      "owned_by": "sglang",
      "root": null,
      "parent": null,
      "max_model_len": 131072
    }
  ]
}

Retrieve Model

Get information about a specific model.

Request

curl http://localhost:30000/v1/models/meta-llama/Llama-3.1-8B-Instruct

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="EMPTY"
)

model = client.models.retrieve("meta-llama/Llama-3.1-8B-Instruct")
print(f"Model: {model.id}")
print(f"Max length: {model.max_model_len}")

Response

string

Model identifier.

object

string

Always "model".

created

integer

Unix timestamp when the model was added.

owned_by

string

Organization that owns the model.

root

string | null

Root model identifier.

parent

string | null

Parent model identifier.

max_model_len

integer | null

Maximum context length.

Example Response

{
  "id": "meta-llama/Llama-3.1-8B-Instruct",
  "object": "model",
  "created": 1234567890,
  "owned_by": "sglang",
  "root": null,
  "parent": null,
  "max_model_len": 131072
}

LoRA Adapters

When using LoRA adapters, you can reference them using the syntax base-model:adapter-name:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")

# Using a LoRA adapter
response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct:my-lora-adapter",
    messages=[{"role": "user", "content": "Hello!"}]
)

Multi-Model Serving

SGLang supports serving multiple models simultaneously using different methods:

Data Parallelism (DP)

Multiple replicas of the same model for higher throughput:

python -m sglang.launch_server \
  --model-path meta-llama/Llama-3.1-8B-Instruct \
  --dp-size 4

Multiple LoRA Adapters

Serve a base model with multiple LoRA adapters:

python -m sglang.launch_server \
  --model-path meta-llama/Llama-3.1-8B-Instruct \
  --lora-paths adapter1,adapter2,adapter3

Examples

List All Models

from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")

models = client.models.list()
print(f"Available models: {len(models.data)}")

for model in models.data:
    max_len = model.max_model_len or "Unknown"
    print(f"- {model.id} (max context: {max_len})")

Check Model Capabilities

model = client.models.retrieve("meta-llama/Llama-3.1-8B-Instruct")

# Check if model supports long context
if model.max_model_len and model.max_model_len >= 100000:
    print(f"{model.id} supports long context ({model.max_model_len} tokens)")
else:
    print(f"{model.id} has limited context ({model.max_model_len} tokens)")

Verify Model Before Request

try:
    model = client.models.retrieve("meta-llama/Llama-3.1-8B-Instruct")
    print(f"Model {model.id} is available")
    
    # Now make a request
    response = client.chat.completions.create(
        model=model.id,
        messages=[{"role": "user", "content": "Hello!"}]
    )
except Exception as e:
    print(f"Model not available: {e}")

Error Handling

Model Not Found

If you request a model that doesn’t exist:

try:
    model = client.models.retrieve("nonexistent-model")
except Exception as e:
    print(f"Error: {e}")
    # Error: Model 'nonexistent-model' not found

Supported Models

SGLang supports a wide range of models including:

Language Models

Llama: Llama 2, Llama 3, Llama 3.1, Llama 3.2
Qwen: Qwen, Qwen2, Qwen2.5
Mistral: Mistral 7B, Mixtral 8x7B, Mixtral 8x22B
DeepSeek: DeepSeek V2, DeepSeek V3
Gemma: Gemma 2B, Gemma 7B, Gemma 2

Vision-Language Models

Llama 3.2 Vision: 11B, 90B
Qwen2-VL: 2B, 7B, 72B
InternVL: 2, 2.5
LLaVA: 1.5, 1.6, OneVision

Other Models

Embedding Models: BGE, E5, etc.
Reasoning Models: GPT-OSS models with reasoning support

For a complete list of supported models, see the supported models documentation.

Python API

Frontend API

HTTP API

CLI Reference

Models

Models

List Models

Request

Response

Example Response

Retrieve Model

Request

Response

Example Response

LoRA Adapters

Multi-Model Serving

Data Parallelism (DP)

Multiple LoRA Adapters

Examples

List All Models

Check Model Capabilities

Verify Model Before Request

Error Handling

Model Not Found

Supported Models

Language Models

Vision-Language Models

Other Models

See Also

Python API

Frontend API

HTTP API

CLI Reference

​Models

​List Models

​Request

​Response

​Example Response

​Retrieve Model

​Request

​Response

​Example Response

​LoRA Adapters

​Multi-Model Serving

​Data Parallelism (DP)

​Multiple LoRA Adapters

​Examples

​List All Models

​Check Model Capabilities

​Verify Model Before Request

​Error Handling

​Model Not Found

​Supported Models

​Language Models

​Vision-Language Models

​Other Models

​See Also

Models

List Models

Request

Response

Example Response

Retrieve Model

Request

Response

Example Response

LoRA Adapters

Multi-Model Serving

Data Parallelism (DP)

Multiple LoRA Adapters

Examples

List All Models

Check Model Capabilities

Verify Model Before Request

Error Handling

Model Not Found

Supported Models

Language Models

Vision-Language Models

Other Models

See Also