Overview
The Qwen family includes:- Qwen 3.5 - Latest generation with hybrid attention and MoE
- Qwen 3 - Dense and MoE variants with reasoning capabilities
- Qwen 2.5 - Previous generation, highly capable
- Qwen 2 - Foundation models
- Qwen-VL - Vision-language multimodal models
- Qwen-Audio - Audio-enabled models
Quick Start
Basic Dense Model
Large MoE Model (Qwen 3.5)
Qwen 3.5 Architecture
Qwen 3.5 features cutting-edge architectural innovations:Key Features
- Hybrid Attention: Gated Delta Networks (linear, O(n) complexity) combined with full attention every 4th layer
- MoE with Shared Experts: Top-8 active out of 64 routed experts plus a dedicated shared expert
- Multimodal: DeepStack Vision Transformer with Conv3d for native image and video understanding
Launch Qwen 3.5 (Dense)
AMD GPU Support (MI300X / MI325X / MI35X)
On AMD Instinct GPUs, use the Triton attention backend:SGLANG_USE_AITER=1 to enable AMD’s optimized aiter kernels for MoE and GEMM operations.
Configuration Tips for Large Models
Qwen 3 Models
Qwen 3 offers a range of sizes from 0.6B to 235B (MoE):Available Models
| Model | Parameters | Type | Use Case |
|---|---|---|---|
| Qwen3-0.6B | 0.6B | Dense | Edge/mobile devices |
| Qwen3-1.7B | 1.7B | Dense | Lightweight deployment |
| Qwen3-4B | 4B | Dense | Balanced performance |
| Qwen3-7B | 7B | Dense | General purpose |
| Qwen3-14B | 14B | Dense | Advanced tasks |
| Qwen3-30B-A3B | 30B total, 3B active | MoE | Efficient large model |
| Qwen3-235B-A22B | 235B total, 22B active | MoE | Largest Qwen 3 |
Launch Examples
Reasoning and Tool Calling
Qwen models support advanced reasoning and tool calling capabilities:Enable Reasoning Parser
Using Reasoning in Requests
With the reasoning parser enabled, the model can separate reasoning tokens from the final answer:Qwen 2.5 & Qwen 2 Models
Previous generation Qwen models are also fully supported:Qwen-VL (Vision-Language Models)
Qwen-VL models process both images and text. See the Multimodal Models guide for complete details.Quick Launch
FP8 Mode (Memory Efficient)
Image Request Example
Video Input Support
Qwen-Audio Models
Qwen2-Audio processes audio input alongside text:Qwen Classification & Reward Models
SGLang supports specialized Qwen variants:Classification Models
Reward Models
Qwen3-Omni (Omnimodal)
Qwen3-Omni is an omni-modal MoE model supporting text, images, audio, and video:Performance Optimization
Expert Parallelism (EP)
For large MoE models, use expert parallelism:Quantization
Reduce memory usage with quantization:Chunked Prefill
For long-context scenarios:Accuracy Evaluation
Evaluate model accuracy usinglm-eval:
Supported Qwen Architectures
SGLang supports the following Qwen model architectures:Qwen3ForCausalLM- Qwen 3 dense modelsQwen3_5ForCausalLM- Qwen 3.5 dense modelsQwen3NextForCausalLM- Qwen 3 Next generationQwen3MoeForCausalLM- Qwen 3 MoE modelsQwen3OmniMoeForCausalLM- Qwen 3 Omni modelsQwen2ForCausalLM- Qwen 2 dense modelsQwen2MoeForCausalLM- Qwen 2 MoE modelsQwen2_5_VLForConditionalGeneration- Qwen 2.5 VLQwen3VLForConditionalGeneration- Qwen 3 VLQwen3VLMoeForConditionalGeneration- Qwen 3 VL MoEQwen2AudioForConditionalGeneration- Qwen 2 AudioQwen2ForSequenceClassification- ClassificationQwen3ForSequenceClassification- ClassificationQwen2ForRewardModel- Reward modelsQwen3ForRewardModel- Reward models
