Skip to main content
SGLang supports a wide range of models across different categories, including large language models (LLMs), multimodal models, and specialized models for specific tasks.

Model Categories

SGLang organizes supported models into the following categories:

Large Language Models

These models accept text input and produce text output. Many feature mixture-of-experts (MoE) architectures for improved scaling and efficiency.

Example Launch Command

python3 -m sglang.launch_server \
  --model-path meta-llama/Llama-3.2-1B-Instruct \
  --host 0.0.0.0 \
  --port 30000

Supported Model Families

Leading Open Models

Model FamilyExample ModelParametersKey Features
DeepSeekdeepseek-ai/DeepSeek-R1Up to 671B (MoE)Advanced reasoning with RL, MLA attention. Optimized for SGLang
Kimi K2moonshotai/Kimi-K2-Instruct1T total, 32B active128K-256K context, agentic intelligence, INT4 quantization
GPT-OSSopenai/gpt-oss-120b20B, 120BOpenAI’s latest for complex reasoning and agentic tasks
QwenQwen/Qwen3.5-397B-A17B0.6B to 397BHybrid attention, MoE variants. Optimized for SGLang
Llamameta-llama/Llama-4-Scout-17B-16E-Instruct7B to 400BMeta’s flagship open models. Optimized for SGLang

Enterprise & Research Models

Model FamilyExample ModelParametersKey Features
Mistral/Mixtralmistralai/Mistral-7B-Instruct-v0.27B to 8x22B (MoE)High-quality open models with MoE variants
Gemmagoogle/gemma-3-1b-it1B to 27BGoogle’s efficient multilingual models, 128K context
Phimicrosoft/Phi-4-multimodal-instruct1.3B to 5.6BMicrosoft’s compact high-performance models
MiniCPMopenbmb/MiniCPM3-4B4BEdge-optimized, GPT-3.5-level performance
OLMo/OLMoEallenai/OLMo-3-1125-32B7B to 32BAllen AI’s fully open language models
Graniteibm-granite/granite-3.1-8b-instruct8B+IBM’s enterprise-focused models
Grokxai-org/grok-1314BxAI’s large-scale model
Command-R/ACohereLabs/c4ai-command-r-v01VariousCohere’s RAG and tool-use optimized models

Specialized & Regional Models

Model FamilyRegion/FocusExample Model
ChatGLM/GLM-4Chinese/EnglishTHUDM/chatglm2-6b, ZhipuAI/glm-4-9b-chat
InternLM 2Multilingualinternlm/internlm2-7b (200K context)
ExaONE 3Korean/EnglishLGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
Baichuan 2Chinese/Englishbaichuan-inc/Baichuan2-13B-Chat
ERNIE-4.5Chinese/Multilingualbaidu/ERNIE-4.5-21B-A3B-PT (MoE)
Hunyuan-LargeMultilingualtencent/Tencent-Hunyuan-Large (389B MoE)
OrionMultilingualOrionStarAI/Orion-14B-Base

Compact & Edge Models

Model FamilyParametersKey Features
SmolLM135M-1.7BUltra-small for mobile/edge devices
MiniMax-M2VariousSOTA for coding & agentic workflows
Arcee AFM4.5BReal-world reliability, edge deployment
TrinityVariousArcee’s MoE family

Architecture Innovations

Model FamilyInnovationExample Model
Kimi LinearHybrid linear attention (6× faster)moonshotai/Kimi-Linear-48B-A3B-Instruct
Falcon-H1Hybrid Mamba-Transformertiiuae/Falcon-H1-34B-Instruct
Nemotron NanoHybrid Mamba-Transformernvidia/NVIDIA-Nemotron-Nano-9B-v2
MiMoMultiple-Token PredictionXiaomiMiMo/MiMo-7B-RL

Additional Supported Models

SGLang also supports many other model architectures including:
  • XVERSE MoE - 255B total, 36B active parameters
  • DBRX - Databricks’ 132B MoE model
  • Llama Nemotron - NVIDIA’s enterprise AI agents (up to 253B)
  • StarCoder2 - Code generation models (3B-15B)
  • Jet-Nemotron - Hybrid architecture language models
  • StableLM - StabilityAI’s 3B-7B models
  • GPT-J/GPT-2/GPT-BigCode - EleutherAI and compatibility models
  • Persimmon - Adept’s 8B chat model
  • Solar - Upstage’s 10.7B instruction model
  • Tele FLM - BAAI’s 52B-1T multilingual model
  • Ling - InclusionAI’s 16.8B-290B MoE models

Finding Model Architectures

To check if a specific model architecture is supported, search GitHub with:
repo:sgl-project/sglang path:/^python\/sglang\/srt\/models\// YourModelArchitecture
For example, to search for Qwen3ForCausalLM:
repo:sgl-project/sglang path:/^python\/sglang\/srt\/models\// Qwen3ForCausalLM

Model-Specific Documentation

For detailed usage instructions and optimizations for specific models, see:

Total Supported Architectures

SGLang currently supports 166+ model architectures out of the box, with continuous additions in each release.