Skip to main content

Overview

The NVIDIA NIM Operator supports multiple storage backends for model caching, including PersistentVolumeClaims (PVC), HostPath, EmptyDir, and NIMCache volumes. Proper storage configuration is critical for model performance and deployment efficiency.

Storage Types

NIMCache provides pre-cached models optimized for specific GPU configurations. This is the recommended approach for production deployments.
1

Create NIMCache

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-3-8b-cache
  namespace: nim-service
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3-8b:1.0.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
      model:
        engine: tensorrt_llm
        tensorParallelism: "1"
  storage:
    pvc:
      create: true
      storageClass: "fast-ssd"
      size: "50Gi"
      volumeAccessMode: ReadWriteOnce
2

Reference in NIMService

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  storage:
    nimCache:
      name: llama-3-8b-cache
      profile: ''  # Optional: specific engine profile
storage.nimCache.name
string
Name of the NIMCache resource to use
storage.nimCache.profile
string
Specific engine profile to use from the NIMCache (e.g., for different GPU configurations)

PersistentVolumeClaim (PVC)

Use PVC for persistent model storage without NIMCache:
Let the operator create and manage the PVC:
storage:
  pvc:
    create: true
    storageClass: "standard"
    size: "100Gi"
    volumeAccessMode: ReadWriteOnce
    annotations:
      volume.beta.kubernetes.io/storage-class: "fast-ssd"
storage.pvc.create
boolean
Whether to create a new PVC (true) or use existing (false)
storage.pvc.name
string
Name of existing PVC when create: false
storage.pvc.storageClass
string
StorageClass to use for PVC creation. Leave empty for default StorageClass.
storage.pvc.size
string
Size of the PVC (e.g., 50Gi, 1Ti)
storage.pvc.volumeAccessMode
string
Access mode: ReadWriteOnce, ReadWriteMany, or ReadOnlyMany
storage.pvc.subPath
string
Subdirectory within the PVC to mount
storage.pvc.annotations
object
Custom annotations for the PVC

HostPath

Use node-local storage (not recommended for production):
storage:
  hostPath: /mnt/nim-models
  readOnly: false
HostPath requires pods to be scheduled on specific nodes and grants hostmount-anyuid SCC on OpenShift. Use only for development or single-node clusters.
storage.hostPath
string
Absolute path on the host filesystem

EmptyDir

Temporary storage (data is lost when pod is deleted):
storage:
  emptyDir:
    sizeLimit: 50Gi
EmptyDir is useful for testing or when models are downloaded at startup. All model data is ephemeral.
storage.emptyDir.sizeLimit
Quantity
Maximum size of the emptyDir volume

Shared Memory Configuration

NIM containers use shared memory for fast model I/O:
storage:
  sharedMemorySizeLimit: 16Gi
storage.sharedMemorySizeLimit
Quantity
Size of the /dev/shm mount (emptyDir with medium: Memory)
Recommended Shared Memory Size
  • Small models (< 10B params): 8-16Gi
  • Medium models (10B-70B): 32-64Gi
  • Large models (> 70B): 64-128Gi
General rule: Allocate 50-70% of total GPU memory

Storage Classes

Use local NVMe SSDs for best performance:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-local-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
spec:
  storage:
    pvc:
      create: true
      storageClass: fast-local-ssd
      size: 100Gi

Network Storage (NFS)

For multi-node deployments with shared cache:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
provisioner: nfs.csi.k8s.io
parameters:
  server: nfs-server.company.com
  share: /exports/nim-cache
volumeBindingMode: Immediate
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
spec:
  multiNode:
    parallelism:
      pipeline: 2
      tensor: 8
  storage:
    pvc:
      create: true
      storageClass: nfs-client
      size: 200Gi
      volumeAccessMode: ReadWriteMany

Cloud Storage Classes

storage:
  pvc:
    create: true
    storageClass: gp3  # AWS EBS gp3
    size: 100Gi
    volumeAccessMode: ReadWriteOnce

Read-Only Storage

Mount storage as read-only to prevent accidental modifications:
storage:
  pvc:
    create: false
    name: shared-model-cache
  readOnly: true
storage.readOnly
boolean
default:"false"
Mount the model storage volume as read-only
Useful when:
  • Sharing a pre-populated model cache across multiple NIMService instances
  • Enforcing immutable infrastructure
  • Using a centralized model repository

Complete Storage Examples

Production with NIMCache

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-3-70b-cache
  namespace: production
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3-70b:1.2.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
      model:
        engine: tensorrt_llm
        tensorParallelism: "4"
  storage:
    pvc:
      create: true
      storageClass: fast-ssd
      size: 200Gi
      volumeAccessMode: ReadWriteOnce
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b
  namespace: production
spec:
  storage:
    nimCache:
      name: llama-3-70b-cache
    sharedMemorySizeLimit: 64Gi
  resources:
    limits:
      nvidia.com/gpu: 4

Multi-Node with NFS

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: deepseek-r1
  namespace: nim-service
spec:
  replicas: 1
  multiNode:
    parallelism:
      pipeline: 2
      tensor: 8
  storage:
    pvc:
      create: true
      storageClass: nfs-client  # RWX-capable storage
      size: 500Gi
      volumeAccessMode: ReadWriteMany
    sharedMemorySizeLimit: 128Gi
  resources:
    limits:
      nvidia.com/gpu: 8

Development with EmptyDir

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-dev
  namespace: development
spec:
  storage:
    emptyDir:
      sizeLimit: 50Gi
    sharedMemorySizeLimit: 8Gi
  resources:
    limits:
      nvidia.com/gpu: 1

Shared Read-Only Cache

# Pre-populate cache
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: shared-llama-cache
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3-8b:1.0.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
  storage:
    pvc:
      create: true
      name: shared-cache-pvc
      storageClass: nfs-client
      size: 100Gi
      volumeAccessMode: ReadWriteMany
---
# Service 1 - read-only
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-service-1
spec:
  storage:
    pvc:
      create: false
      name: shared-cache-pvc
    readOnly: true
---
# Service 2 - read-only
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-service-2
spec:
  storage:
    pvc:
      create: false
      name: shared-cache-pvc
    readOnly: true

Storage Best Practices

Storage Recommendations
  1. Use NIMCache for production - Pre-cached models are faster and more reliable
  2. Choose the right storage class - Local SSDs for single-node, NFS for multi-node
  3. Size appropriately - Model size + 20-30% overhead for runtime files
  4. Use ReadWriteMany for multi-node - Required for LeaderWorkerSet deployments
  5. Configure shared memory - Set to 50-70% of total GPU memory
  6. Monitor storage usage - Set up alerts for PVC capacity

Storage Sizing Guide

Model SizePVC SizeShared MemoryStorage Type
1B-7B params50Gi8-16GiRWO, local
8B-13B params100Gi16-32GiRWO, local
30B-70B params200Gi32-64GiRWO/RWX
70B+ params500Gi+64-128GiRWX, network
Multi-node1Ti+128Gi+RWX, NFS

Volume Access Modes

  • ReadWriteOnce (RWO): Single-node deployments, best performance
  • ReadWriteMany (RWX): Multi-node deployments, required for LeaderWorkerSet
  • ReadOnlyMany (ROX): Shared read-only caches across services

Performance Optimization

  1. Use local NVMe SSDs for fastest model loading
  2. Pre-cache models with NIMCache to avoid download delays
  3. Allocate sufficient shared memory for model runtime operations
  4. Use volumeBindingMode: WaitForFirstConsumer for topology-aware scheduling

Volume Mounts

The operator automatically creates volume mounts:
# Automatically created mounts:
volumeMounts:
- name: model-store
  mountPath: /model-store
  subPath: ""  # From pvc.subPath if specified
- name: dshm
  mountPath: /dev/shm
Environment variable is automatically set:
env:
- name: NIM_CACHE_PATH
  value: /model-store

Storage Backends

NFS Server Setup

Example NFS server for shared storage:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv-nim-cache
spec:
  capacity:
    storage: 1Ti
  accessModes:
  - ReadWriteMany
  nfs:
    server: nfs-server.company.com
    path: /exports/nim-cache
  mountOptions:
  - hard
  - nfsvers=4.1
  - rsize=1048576
  - wsize=1048576
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-nim-cache
  namespace: nim-service
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 1Ti
  volumeName: nfs-pv-nim-cache

S3-Compatible Storage (via CSI)

Use S3 CSI driver for object storage:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: s3-csi
provisioner: s3.csi.aws.com
parameters:
  bucket: nim-model-cache
  region: us-west-2
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
spec:
  storage:
    pvc:
      create: true
      storageClass: s3-csi
      size: 1Ti

Troubleshooting

PVC Not Bound

Check PVC status:
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
Common issues:
  • No available PersistentVolume matches the claim
  • StorageClass not found
  • Insufficient storage capacity

Permission Denied Errors

Error: permission denied: /model-store
Solution: Ensure PVC has correct ownership:
# If using hostPath or local volume
sudo chown -R 1000:2000 /path/to/storage
sudo chmod -R 755 /path/to/storage
Or configure fsGroup in security context (done automatically by operator).

Model Not Found

Error: Model not found in /model-store
Solutions:
  1. Verify NIMCache was created and completed
  2. Check PVC is mounted correctly
  3. Verify storage.nimCache.name matches NIMCache resource

Out of Storage Space

Monitor PVC usage:
kubectl exec -it <pod-name> -n <namespace> -- df -h /model-store
Increase PVC size (if StorageClass supports expansion):
kubectl patch pvc <pvc-name> -n <namespace> -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

Multi-Node Mount Failures

Error: Multi-Attach error for volume "pvc-xxx" Volume is already exclusively attached
Solution: Use ReadWriteMany access mode:
storage:
  pvc:
    volumeAccessMode: ReadWriteMany  # Not ReadWriteOnce

Build docs developers (and LLMs) love