Installation

You can install SGLang using one of the methods below. This page primarily applies to common NVIDIA GPU platforms. For other platforms, please refer to the dedicated pages for AMD GPUs, Intel Xeon CPUs, TPU, Ascend NPUs, and Intel XPU.

Quick Installation

Install with pip or uv (Recommended)

It is recommended to use uv for faster installation:

pip install --upgrade pip
pip install uv
uv pip install sglang

This will install SGLang with CUDA 12.9 support by default, which is compatible with most recent NVIDIA GPUs.

Verify Installation

After installation, verify that SGLang is working:

python -c "import sglang; print(sglang.__version__)"

Installation Methods

Method 1: Install with pip or uv

Standard Installation

The simplest way to install SGLang:

pip install --upgrade pip
pip install uv
uv pip install sglang

CUDA 13 Installation (B300/GB300)

For CUDA 13 support on B300/GB300 GPUs, Docker is recommended. If you don’t have Docker access:

Install PyTorch with CUDA 13

# Replace X.Y.Z with the version required by your SGLang install
uv pip install torch==X.Y.Z torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Install SGLang

uv pip install sglang

Install sgl_kernel for CUDA 13

Install the appropriate wheel from sgl-project whl releases:

uv pip install "https://github.com/sgl-project/whl/releases/download/vX.Y.Z/sgl_kernel-X.Y.Z+cu130-cp310-abi3-manylinux2014_x86_64.whl"

Replace X.Y.Z with the sgl_kernel version from uv pip show sgl_kernel.

Troubleshooting Common Issues

CUDA_HOME not set:If you encounter OSError: CUDA_HOME environment variable is not set, try one of these solutions:

export CUDA_HOME=/usr/local/cuda-<your-cuda-version>

Reinstall FlashInfer:

pip3 install --upgrade flashinfer-python --force-reinstall --no-deps
rm -rf ~/.cache/flashinfer

Blackwell GPU (B300/GB300) ptxas error:

export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas

Method 2: Install from Source

For development or to use the latest features:

# Use the last release branch
git clone -b v0.5.6.post2 https://github.com/sgl-project/sglang.git
cd sglang

# Install the python packages
pip install --upgrade pip
pip install -e "python"

For development, use the dev docker image lmsysorg/sglang:dev. See the development guide for details.

Method 3: Using Docker

Docker images are available at lmsysorg/sglang.

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000

Replace <secret> with your Hugging Face hub token.

The runtime variant is ~40% smaller than the full image by excluding build tools and development dependencies, making it ideal for production deployments.

Method 4: Using Kubernetes

Check out OME, a Kubernetes operator for enterprise-grade LLM serving.

Single Node Deployment

For models that fit on one node:

kubectl apply -f docker/k8s-sglang-service.yaml

Multi-Node Deployment

For large models requiring multiple GPU nodes:

# Modify the model path and arguments in the file first
kubectl apply -f docker/k8s-sglang-distributed-sts.yaml

Method 5: Using Docker Compose

Docker Compose Setup

For production, use the k8s-sglang-service.yaml instead.

Copy compose.yml

Download the compose.yml to your local machine.

Launch the service

docker compose up -d

Method 6: Using SkyPilot

Deploy on Kubernetes or 12+ clouds with SkyPilot.

SkyPilot Deployment

Install SkyPilot

Follow SkyPilot’s documentation to install and configure cloud access.

Create sglang.yaml

sglang.yaml

envs:
  HF_TOKEN: null

resources:
  image_id: docker:lmsysorg/sglang:latest
  accelerators: A100
  ports: 30000

run: |
  conda deactivate
  python3 -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --host 0.0.0.0 \
    --port 30000

Deploy

# Deploy on any cloud or Kubernetes
HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml

# Get the HTTP API endpoint
sky status --endpoint 30000 sglang

For autoscaling and failure recovery, check out the SkyServe + SGLang guide.

Method 7: AWS SageMaker

Deploy on AWS SageMaker

AWS provides SGLang DLCs with routine security patching.To host with your own container:

Build Docker container

Build with sagemaker.Dockerfile and the serve script.

Push to AWS ECR

build-and-push.sh

#!/bin/bash
AWS_ACCOUNT="<YOUR_AWS_ACCOUNT>"
AWS_REGION="<YOUR_AWS_REGION>"
REPOSITORY_NAME="<YOUR_REPOSITORY_NAME>"
IMAGE_TAG="<YOUR_IMAGE_TAG>"

ECR_REGISTRY="${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com"
IMAGE_URI="${ECR_REGISTRY}/${REPOSITORY_NAME}:${IMAGE_TAG}"

# Login to ECR
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${ECR_REGISTRY}

# Build and push
docker build -t ${IMAGE_URI} -f sagemaker.Dockerfile .
docker push ${IMAGE_URI}

Deploy model

Use deploy_and_serve_endpoint.py to deploy. See sagemaker-python-sdk for more details.

Customize server parameters using environment variables with the SM_SGLANG_ prefix. For example, SM_SGLANG_MODEL_PATH=Qwen/Qwen3-0.6B and SM_SGLANG_REASONING_PARSER=qwen3.

Hardware-Specific Installation

AMD GPUs (ROCm)

Install on AMD GPUs

For AMD GPUs like MI300X:

git clone -b v0.5.6.post2 https://github.com/sgl-project/sglang.git
cd sglang

# Compile sgl-kernel
pip install --upgrade pip
cd sgl-kernel
python setup_rocm.py install

# Install sglang
cd ..
rm -rf python/pyproject.toml && mv python/pyproject_other.toml python/pyproject.toml
pip install -e "python[all_hip]"

See the AMD GPU documentation for system optimization and tuning guides.

Google TPU

Install on TPU

SGLang supports TPUs through the SGLang-JAX backend:

pip install sglang-jax

See the TPU documentation for feature support and optimized models.

Intel CPUs

Install on Intel Xeon CPUs

See the CPU Server documentation for Intel Xeon CPU deployment instructions.

Ascend NPUs

Install on Ascend NPUs

See the Ascend NPU documentation for Huawei Ascend NPU installation.

Dependencies

SGLang has the following core dependencies (from pyproject.toml):

Core Dependencies

Python: >=3.10
PyTorch: 2.9.1 (CUDA 12.9 by default)
FlashInfer: 0.6.4 (attention kernel backend)
Transformers: 4.57.1
FastAPI: Web server framework
OpenAI: 2.6.1 (API compatibility)
sgl-kernel: 0.3.21 (custom CUDA kernels)

Optional dependencies:

Diffusion: For image/video generation models
Tracing: OpenTelemetry integration
Test: Development and testing tools

Common Notes

FlashInfer: Default attention kernel backend. Only supports sm75 and above (T4, A10, A100, L4, L40S, H100, B200, etc.).If you encounter FlashInfer issues on supported GPUs, switch to alternative backends:

python -m sglang.launch_server --model-path <model> --attention-backend triton --sampling-backend pytorch

Shared Memory: Docker and Kubernetes deployments require sufficient shared memory (--shm-size 32g for Docker, update /dev/shm size for Kubernetes).

Next Steps

After installation:

Get Started

Core Concepts

Backend (Runtime)

Frontend (Language)

Model Support

Advanced Features

Distributed Serving

Optimization

Deployment

Observability

Quick Installation

Installation Methods

Method 1: Install with pip or uv

Method 2: Install from Source

Method 3: Using Docker

Method 4: Using Kubernetes

Method 5: Using Docker Compose

Method 6: Using SkyPilot

Method 7: AWS SageMaker

Hardware-Specific Installation

AMD GPUs (ROCm)

Google TPU

Intel CPUs

Ascend NPUs

Dependencies

Common Notes

Next Steps

Get Started

Core Concepts

Backend (Runtime)

Frontend (Language)

Model Support

Advanced Features

Distributed Serving

Optimization

Deployment

Observability

​Quick Installation

​Installation Methods

​Method 1: Install with pip or uv

​Method 2: Install from Source

​Method 3: Using Docker

​Method 4: Using Kubernetes

​Method 5: Using Docker Compose

​Method 6: Using SkyPilot

​Method 7: AWS SageMaker

​Hardware-Specific Installation

​AMD GPUs (ROCm)

​Google TPU

​Intel CPUs

​Ascend NPUs

​Dependencies

​Common Notes

​Next Steps

Quick Installation

Installation Methods

Method 1: Install with pip or uv

Method 2: Install from Source

Method 3: Using Docker

Method 4: Using Kubernetes

Method 5: Using Docker Compose

Method 6: Using SkyPilot

Method 7: AWS SageMaker

Hardware-Specific Installation

AMD GPUs (ROCm)

Google TPU

Intel CPUs

Ascend NPUs

Dependencies

Common Notes

Next Steps