Overview
Kubernetes provides robust orchestration for SGLang deployments, enabling auto-scaling, self-healing, and declarative configuration. This guide covers single-node and distributed multi-node deployments.Prerequisites
- Kubernetes cluster version ≥1.26
- NVIDIA GPU Operator or device plugin installed
kubectlconfigured for cluster access- Storage class for persistent volumes (for model caching)
GPU Support Setup
Install NVIDIA device plugin if not already available:Single-Node Deployment
Basic Deployment
Deploy a single-replica SGLang server:sglang-deployment.yaml and apply:
Verify Deployment
Multi-Node Distributed Deployment
Using StatefulSet
For multi-node tensor parallelism across nodes:LeaderWorkerSet (LWS) Deployment
LeaderWorkerSet is the recommended approach for multi-node distributed inference.Prerequisites
Install LeaderWorkerSet controller:Basic LWS Configuration
Deploy with LWS
Monitor LWS Deployment
RDMA/InfiniBand Configuration
For high-performance multi-node setups with RDMA:Prerequisites
- Verify InfiniBand devices on nodes:
- Check RDMA accessibility:
RDMA-Enabled Deployment
Storage Configuration
Persistent Volume for Model Cache
Using hostPath (Development)
Using NFS (Production)
Resource Management
Resource Requests and Limits
Node Selection
Tolerations
Monitoring and Observability
Enable Metrics
Add metrics endpoint to your deployment:Prometheus Integration
Scaling
Horizontal Pod Autoscaler
Troubleshooting
Pod Stuck in Pending
NCCL Communication Failures
RDMA Issues
Out of Memory
Best Practices
- Use StatefulSet for multi-node: StatefulSets provide stable network identities
- Enable hostNetwork for RDMA: Required for high-performance inter-node communication
- Set privileged mode for InfiniBand: Necessary for RDMA device access
- Use ReadWriteMany PVCs: Enable model sharing across pods
- Configure health probes: Implement both liveness and readiness probes
- Set resource limits: Prevent resource contention
- Use specific image tags: Avoid
latestin production - Monitor NCCL environment: Tune based on network topology
Next Steps
- Multi-Node Configuration - Advanced multi-node setups
- Cloud Platforms - Managed Kubernetes on cloud providers
- Docker Deployment - Container-based deployment
