Quick Installation
Install with pip or uv (Recommended)
It is recommended to use
uv for faster installation:This will install SGLang with CUDA 12.9 support by default, which is compatible with most recent NVIDIA GPUs.
Installation Methods
Method 1: Install with pip or uv
Standard Installation
Standard Installation
The simplest way to install SGLang:
CUDA 13 Installation (B300/GB300)
CUDA 13 Installation (B300/GB300)
For CUDA 13 support on B300/GB300 GPUs, Docker is recommended. If you don’t have Docker access:
Install sgl_kernel for CUDA 13
Install the appropriate wheel from sgl-project whl releases:
Replace
X.Y.Z with the sgl_kernel version from uv pip show sgl_kernel.Troubleshooting Common Issues
Troubleshooting Common Issues
CUDA_HOME not set:If you encounter Reinstall FlashInfer:Blackwell GPU (B300/GB300) ptxas error:
OSError: CUDA_HOME environment variable is not set, try one of these solutions:Method 2: Install from Source
For development or to use the latest features:For development, use the dev docker image
lmsysorg/sglang:dev. See the development guide for details.Method 3: Using Docker
Docker images are available at lmsysorg/sglang.The
runtime variant is ~40% smaller than the full image by excluding build tools and development dependencies, making it ideal for production deployments.Method 4: Using Kubernetes
Check out OME, a Kubernetes operator for enterprise-grade LLM serving.Single Node Deployment
Single Node Deployment
For models that fit on one node:
Multi-Node Deployment
Multi-Node Deployment
For large models requiring multiple GPU nodes:
Method 5: Using Docker Compose
Docker Compose Setup
Docker Compose Setup
Copy compose.yml
Download the compose.yml to your local machine.
Method 6: Using SkyPilot
Deploy on Kubernetes or 12+ clouds with SkyPilot.SkyPilot Deployment
SkyPilot Deployment
Install SkyPilot
Follow SkyPilot’s documentation to install and configure cloud access.
Method 7: AWS SageMaker
Deploy on AWS SageMaker
Deploy on AWS SageMaker
AWS provides SGLang DLCs with routine security patching.To host with your own container:
Build Docker container
Build with sagemaker.Dockerfile and the serve script.
Deploy model
Use deploy_and_serve_endpoint.py to deploy. See sagemaker-python-sdk for more details.
Customize server parameters using environment variables with the
SM_SGLANG_ prefix. For example, SM_SGLANG_MODEL_PATH=Qwen/Qwen3-0.6B and SM_SGLANG_REASONING_PARSER=qwen3.Hardware-Specific Installation
AMD GPUs (ROCm)
Install on AMD GPUs
Install on AMD GPUs
Google TPU
Install on TPU
Install on TPU
SGLang supports TPUs through the SGLang-JAX backend:See the TPU documentation for feature support and optimized models.
Intel CPUs
Install on Intel Xeon CPUs
Install on Intel Xeon CPUs
See the CPU Server documentation for Intel Xeon CPU deployment instructions.
Ascend NPUs
Install on Ascend NPUs
Install on Ascend NPUs
See the Ascend NPU documentation for Huawei Ascend NPU installation.
Dependencies
SGLang has the following core dependencies (frompyproject.toml):
Core Dependencies
Core Dependencies
- Python: >=3.10
- PyTorch: 2.9.1 (CUDA 12.9 by default)
- FlashInfer: 0.6.4 (attention kernel backend)
- Transformers: 4.57.1
- FastAPI: Web server framework
- OpenAI: 2.6.1 (API compatibility)
- sgl-kernel: 0.3.21 (custom CUDA kernels)
- Diffusion: For image/video generation models
- Tracing: OpenTelemetry integration
- Test: Development and testing tools
Common Notes
FlashInfer: Default attention kernel backend. Only supports sm75 and above (T4, A10, A100, L4, L40S, H100, B200, etc.).If you encounter FlashInfer issues on supported GPUs, switch to alternative backends:
