Overview
SGLang provides official Docker images optimized for both production inference and development. This guide covers Docker-based deployment options including standalone containers, Docker Compose, and custom builds.Quick Start
Using Pre-built Images
Pull the latest SGLang image from Docker Hub:Run a Single Container
Deploy a model with a single command:Docker Compose Deployment
Use Docker Compose for declarative container management:compose.yaml and run:
Available Docker Images
SGLang provides several specialized images:| Image Tag | Description | Use Case |
|---|---|---|
lmsysorg/sglang:latest | Latest stable release | Production inference |
lmsysorg/sglang:deepep | DeepEP-enabled build | Multi-node MoE models |
lmsysorg/sglang:v<version> | Specific version | Version pinning |
Specialized Dockerfiles
The repository includes Dockerfiles for specific hardware:- rocm.Dockerfile: AMD GPUs with ROCm support
- xpu.Dockerfile: Intel GPUs
- npu.Dockerfile: Ascend NPUs
- xeon.Dockerfile: Intel Xeon CPUs
- diffusion.Dockerfile: Diffusion model serving
- gateway.Dockerfile: Model routing gateway
Building Custom Images
Build from Source
Build the standard CUDA image:Build Arguments
Customize your build with these arguments:Multi-Stage Build Targets
The Dockerfile supports multiple build targets:Runtime Image (Default)
Production-ready with JIT compilation support:Framework Development Image
Includes development tools (vim, tmux, gdb, nsight):ROCm Image for AMD GPUs
Build for AMD MI300X or MI350X:Container Configuration
Volume Mounts
Essential volume mounts for production:Network Configuration
Host Network (Recommended for RDMA)
Bridge Network
Environment Variables
Common environment variables:Resource Limits
Health Checks
Implement health checks for container orchestration:Logging and Monitoring
View Logs
Container Stats
Troubleshooting
Container Won’t Start
CUDA Errors
Out of Memory
Permission Denied
Production Best Practices
1. Use Specific Version Tags
2. Implement Health Checks
3. Configure Resource Limits
4. Enable Auto-Restart
5. Secure Secrets
Next Steps
- Kubernetes Deployment - Deploy on Kubernetes
- Multi-Node Setup - Distributed inference across nodes
- Cloud Platforms - Deploy on AWS, GCP, Azure
