Overview
The NVIDIA NIM Operator supports comprehensive resource management for NIM workloads, including traditional Kubernetes resource requests/limits and Dynamic Resource Allocation (DRA) for GPU resources.Resource Requirements
Basic Resource Configuration
Configure CPU, memory, and GPU resources using theresources field in your NIMService spec:
The
resources field supports traditional Kubernetes resources (CPU, memory) and custom device plugin resources (nvidia.com/gpu). For DRA-based GPU allocation, use the draResources field instead.GPU Resource Configuration
Device Plugin (Traditional)
The traditional approach uses the NVIDIA device plugin to allocate GPUs:Dynamic Resource Allocation (DRA)
DRA provides fine-grained GPU resource allocation with attribute-based selection. This is the recommended approach for production deployments.- Auto-Creation (Simple)
- Auto-Creation (Advanced)
- Reference Existing Claim
- Reference Claim Template
Automatically create a DRA resource claim with minimal configuration:The operator will automatically generate a ResourceClaimTemplate with default settings.
DRA Attribute Selectors
DRA supports attribute-based device selection using various operators:Device attribute name (e.g.,
productName, memory, cuda.computeCapability)Comparison operator:
Equal, NotEqual, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqualValue to compare against. Supports:
boolValue: true/falseintValue: numeric valuestringValue: string value (max 64 chars)versionValue: semantic version (semver 2.0.0)
DRA Capacity Selectors
Filter devices by resource capacity:Resource name (e.g.,
memory, gpu.nvidia.com/bandwidth)Comparison operator
Kubernetes resource quantity (e.g.,
80Gi, 1TB)Workload Size Examples
Small Workload (1 GPU)
Ideal for testing and development:Medium Workload (4 GPUs)
Production workload for medium-sized models:Large Workload (8 GPUs)
Large-scale production deployment:Multi-Node Workload
For models requiring multiple nodes with tensor/pipeline parallelism:Shared Memory Configuration
NIM containers require shared memory for fast model I/O. Configure the shared memory size:Maximum size of the shared memory volume (emptyDir with medium=Memory). Recommended: 50% of total GPU memory.
Best Practices
Resource Limits vs Requests
- Requests: Guaranteed resources for scheduling
- Limits: Maximum resources the container can use
GPU Selection Strategies
- By Product Name
- By Memory
- By Compute Capability
Troubleshooting
Pod Pending Due to Insufficient Resources
Check resource availability:DRA Claim Not Satisfied
View claim status:GPU Not Detected
Verify NVIDIA device plugin or DRA driver is running:Related Resources
- Scaling Configuration - Configure autoscaling based on resource utilization
- Storage Configuration - Configure persistent storage for model caches
- Multi-Node Deployment - Deploy models across multiple nodes