Overview
The NVIDIA NIM Operator supports multiple storage backends for model caching, including PersistentVolumeClaims (PVC), HostPath, EmptyDir, and NIMCache volumes. Proper storage configuration is critical for model performance and deployment efficiency.
Storage Types
NIMCache Volume (Recommended)
NIMCache provides pre-cached models optimized for specific GPU configurations. This is the recommended approach for production deployments.
Create NIMCache
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMCache
metadata :
name : llama-3-8b-cache
namespace : nim-service
spec :
source :
ngc :
modelPuller : nvcr.io/nim/meta/llama-3-8b:1.0.0
pullSecret : ngc-secret
authSecret : ngc-api-secret
model :
engine : tensorrt_llm
tensorParallelism : "1"
storage :
pvc :
create : true
storageClass : "fast-ssd"
size : "50Gi"
volumeAccessMode : ReadWriteOnce
Reference in NIMService
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-3-8b
spec :
storage :
nimCache :
name : llama-3-8b-cache
profile : '' # Optional: specific engine profile
Name of the NIMCache resource to use
Specific engine profile to use from the NIMCache (e.g., for different GPU configurations)
PersistentVolumeClaim (PVC)
Use PVC for persistent model storage without NIMCache:
Let the operator create and manage the PVC: storage :
pvc :
create : true
storageClass : "standard"
size : "100Gi"
volumeAccessMode : ReadWriteOnce
annotations :
volume.beta.kubernetes.io/storage-class : "fast-ssd"
Reference a pre-created PVC: storage :
pvc :
create : false
name : existing-nim-cache-pvc
subPath : "models/llama-3-8b" # Optional subdirectory
For multi-node deployments, use ReadWriteMany: storage :
pvc :
create : true
storageClass : "nfs-client" # Must support RWX
size : "200Gi"
volumeAccessMode : ReadWriteMany
Whether to create a new PVC (true) or use existing (false)
Name of existing PVC when create: false
StorageClass to use for PVC creation. Leave empty for default StorageClass.
Size of the PVC (e.g., 50Gi, 1Ti)
storage.pvc.volumeAccessMode
Access mode: ReadWriteOnce, ReadWriteMany, or ReadOnlyMany
Subdirectory within the PVC to mount
Custom annotations for the PVC
HostPath
Use node-local storage (not recommended for production):
storage :
hostPath : /mnt/nim-models
readOnly : false
HostPath requires pods to be scheduled on specific nodes and grants hostmount-anyuid SCC on OpenShift. Use only for development or single-node clusters.
Absolute path on the host filesystem
EmptyDir
Temporary storage (data is lost when pod is deleted):
storage :
emptyDir :
sizeLimit : 50Gi
EmptyDir is useful for testing or when models are downloaded at startup. All model data is ephemeral.
storage.emptyDir.sizeLimit
Maximum size of the emptyDir volume
Shared Memory Configuration
NIM containers use shared memory for fast model I/O:
storage :
sharedMemorySizeLimit : 16Gi
storage.sharedMemorySizeLimit
Size of the /dev/shm mount (emptyDir with medium: Memory)
Recommended Shared Memory Size
Small models (< 10B params) : 8-16Gi
Medium models (10B-70B) : 32-64Gi
Large models (> 70B) : 64-128Gi
General rule: Allocate 50-70% of total GPU memory
Storage Classes
Fast Local Storage (Recommended)
Use local NVMe SSDs for best performance:
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : fast-local-ssd
provisioner : kubernetes.io/no-provisioner
volumeBindingMode : WaitForFirstConsumer
---
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
spec :
storage :
pvc :
create : true
storageClass : fast-local-ssd
size : 100Gi
Network Storage (NFS)
For multi-node deployments with shared cache:
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : nfs-client
provisioner : nfs.csi.k8s.io
parameters :
server : nfs-server.company.com
share : /exports/nim-cache
volumeBindingMode : Immediate
---
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
spec :
multiNode :
parallelism :
pipeline : 2
tensor : 8
storage :
pvc :
create : true
storageClass : nfs-client
size : 200Gi
volumeAccessMode : ReadWriteMany
Cloud Storage Classes
AWS EBS
GCP Persistent Disk
Azure Disk
storage :
pvc :
create : true
storageClass : gp3 # AWS EBS gp3
size : 100Gi
volumeAccessMode : ReadWriteOnce
Read-Only Storage
Mount storage as read-only to prevent accidental modifications:
storage :
pvc :
create : false
name : shared-model-cache
readOnly : true
Mount the model storage volume as read-only
Useful when:
Sharing a pre-populated model cache across multiple NIMService instances
Enforcing immutable infrastructure
Using a centralized model repository
Complete Storage Examples
Production with NIMCache
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMCache
metadata :
name : llama-3-70b-cache
namespace : production
spec :
source :
ngc :
modelPuller : nvcr.io/nim/meta/llama-3-70b:1.2.0
pullSecret : ngc-secret
authSecret : ngc-api-secret
model :
engine : tensorrt_llm
tensorParallelism : "4"
storage :
pvc :
create : true
storageClass : fast-ssd
size : 200Gi
volumeAccessMode : ReadWriteOnce
---
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-3-70b
namespace : production
spec :
storage :
nimCache :
name : llama-3-70b-cache
sharedMemorySizeLimit : 64Gi
resources :
limits :
nvidia.com/gpu : 4
Multi-Node with NFS
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : deepseek-r1
namespace : nim-service
spec :
replicas : 1
multiNode :
parallelism :
pipeline : 2
tensor : 8
storage :
pvc :
create : true
storageClass : nfs-client # RWX-capable storage
size : 500Gi
volumeAccessMode : ReadWriteMany
sharedMemorySizeLimit : 128Gi
resources :
limits :
nvidia.com/gpu : 8
Development with EmptyDir
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-dev
namespace : development
spec :
storage :
emptyDir :
sizeLimit : 50Gi
sharedMemorySizeLimit : 8Gi
resources :
limits :
nvidia.com/gpu : 1
Shared Read-Only Cache
# Pre-populate cache
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMCache
metadata :
name : shared-llama-cache
spec :
source :
ngc :
modelPuller : nvcr.io/nim/meta/llama-3-8b:1.0.0
pullSecret : ngc-secret
authSecret : ngc-api-secret
storage :
pvc :
create : true
name : shared-cache-pvc
storageClass : nfs-client
size : 100Gi
volumeAccessMode : ReadWriteMany
---
# Service 1 - read-only
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-service-1
spec :
storage :
pvc :
create : false
name : shared-cache-pvc
readOnly : true
---
# Service 2 - read-only
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-service-2
spec :
storage :
pvc :
create : false
name : shared-cache-pvc
readOnly : true
Storage Best Practices
Storage Recommendations
Use NIMCache for production - Pre-cached models are faster and more reliable
Choose the right storage class - Local SSDs for single-node, NFS for multi-node
Size appropriately - Model size + 20-30% overhead for runtime files
Use ReadWriteMany for multi-node - Required for LeaderWorkerSet deployments
Configure shared memory - Set to 50-70% of total GPU memory
Monitor storage usage - Set up alerts for PVC capacity
Storage Sizing Guide
Model Size PVC Size Shared Memory Storage Type 1B-7B params 50Gi 8-16Gi RWO, local 8B-13B params 100Gi 16-32Gi RWO, local 30B-70B params 200Gi 32-64Gi RWO/RWX 70B+ params 500Gi+ 64-128Gi RWX, network Multi-node 1Ti+ 128Gi+ RWX, NFS
Volume Access Modes
ReadWriteOnce (RWO) : Single-node deployments, best performance
ReadWriteMany (RWX) : Multi-node deployments, required for LeaderWorkerSet
ReadOnlyMany (ROX) : Shared read-only caches across services
Use local NVMe SSDs for fastest model loading
Pre-cache models with NIMCache to avoid download delays
Allocate sufficient shared memory for model runtime operations
Use volumeBindingMode: WaitForFirstConsumer for topology-aware scheduling
Volume Mounts
The operator automatically creates volume mounts:
# Automatically created mounts:
volumeMounts :
- name : model-store
mountPath : /model-store
subPath : "" # From pvc.subPath if specified
- name : dshm
mountPath : /dev/shm
Environment variable is automatically set:
env :
- name : NIM_CACHE_PATH
value : /model-store
Storage Backends
NFS Server Setup
Example NFS server for shared storage:
apiVersion : v1
kind : PersistentVolume
metadata :
name : nfs-pv-nim-cache
spec :
capacity :
storage : 1Ti
accessModes :
- ReadWriteMany
nfs :
server : nfs-server.company.com
path : /exports/nim-cache
mountOptions :
- hard
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
---
apiVersion : v1
kind : PersistentVolumeClaim
metadata :
name : nfs-nim-cache
namespace : nim-service
spec :
accessModes :
- ReadWriteMany
storageClassName : ""
resources :
requests :
storage : 1Ti
volumeName : nfs-pv-nim-cache
S3-Compatible Storage (via CSI)
Use S3 CSI driver for object storage:
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : s3-csi
provisioner : s3.csi.aws.com
parameters :
bucket : nim-model-cache
region : us-west-2
---
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
spec :
storage :
pvc :
create : true
storageClass : s3-csi
size : 1Ti
Troubleshooting
PVC Not Bound
Check PVC status:
kubectl get pvc -n < namespac e >
kubectl describe pvc < pvc-nam e > -n < namespac e >
Common issues:
No available PersistentVolume matches the claim
StorageClass not found
Insufficient storage capacity
Permission Denied Errors
Error: permission denied: /model-store
Solution : Ensure PVC has correct ownership:
# If using hostPath or local volume
sudo chown -R 1000:2000 /path/to/storage
sudo chmod -R 755 /path/to/storage
Or configure fsGroup in security context (done automatically by operator).
Model Not Found
Error: Model not found in /model-store
Solutions :
Verify NIMCache was created and completed
Check PVC is mounted correctly
Verify storage.nimCache.name matches NIMCache resource
Out of Storage Space
Monitor PVC usage:
kubectl exec -it < pod-nam e > -n < namespac e > -- df -h /model-store
Increase PVC size (if StorageClass supports expansion):
kubectl patch pvc < pvc-nam e > -n < namespac e > -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
Multi-Node Mount Failures
Error: Multi-Attach error for volume "pvc-xxx" Volume is already exclusively attached
Solution : Use ReadWriteMany access mode:
storage :
pvc :
volumeAccessMode : ReadWriteMany # Not ReadWriteOnce