Storage Configuration

Overview

The NVIDIA NIM Operator supports multiple storage backends for model caching, including PersistentVolumeClaims (PVC), HostPath, EmptyDir, and NIMCache volumes. Proper storage configuration is critical for model performance and deployment efficiency.

Storage Types

NIMCache Volume (Recommended)

NIMCache provides pre-cached models optimized for specific GPU configurations. This is the recommended approach for production deployments.

Create NIMCache

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-3-8b-cache
  namespace: nim-service
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3-8b:1.0.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
      model:
        engine: tensorrt_llm
        tensorParallelism: "1"
  storage:
    pvc:
      create: true
      storageClass: "fast-ssd"
      size: "50Gi"
      volumeAccessMode: ReadWriteOnce

Reference in NIMService

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  storage:
    nimCache:
      name: llama-3-8b-cache
      profile: ''  # Optional: specific engine profile

storage.nimCache.name

string

Name of the NIMCache resource to use

storage.nimCache.profile

string

Specific engine profile to use from the NIMCache (e.g., for different GPU configurations)

PersistentVolumeClaim (PVC)

Use PVC for persistent model storage without NIMCache:

Auto-Create PVC
Use Existing PVC
ReadWriteMany for Multi-Node

Let the operator create and manage the PVC:

storage:
  pvc:
    create: true
    storageClass: "standard"
    size: "100Gi"
    volumeAccessMode: ReadWriteOnce
    annotations:
      volume.beta.kubernetes.io/storage-class: "fast-ssd"

Reference a pre-created PVC:

storage:
  pvc:
    create: false
    name: existing-nim-cache-pvc
    subPath: "models/llama-3-8b"  # Optional subdirectory

For multi-node deployments, use ReadWriteMany:

storage:
  pvc:
    create: true
    storageClass: "nfs-client"  # Must support RWX
    size: "200Gi"
    volumeAccessMode: ReadWriteMany

storage.pvc.create

boolean

Whether to create a new PVC (true) or use existing (false)

storage.pvc.name

string

Name of existing PVC when create: false

storage.pvc.storageClass

string

StorageClass to use for PVC creation. Leave empty for default StorageClass.

storage.pvc.size

string

Size of the PVC (e.g., 50Gi, 1Ti)

storage.pvc.volumeAccessMode

string

Access mode: ReadWriteOnce, ReadWriteMany, or ReadOnlyMany

storage.pvc.subPath

string

Subdirectory within the PVC to mount

storage.pvc.annotations

object

Custom annotations for the PVC

HostPath

Use node-local storage (not recommended for production):

storage:
  hostPath: /mnt/nim-models
  readOnly: false

HostPath requires pods to be scheduled on specific nodes and grants hostmount-anyuid SCC on OpenShift. Use only for development or single-node clusters.

storage.hostPath

string

Absolute path on the host filesystem

EmptyDir

Temporary storage (data is lost when pod is deleted):

storage:
  emptyDir:
    sizeLimit: 50Gi

EmptyDir is useful for testing or when models are downloaded at startup. All model data is ephemeral.

storage.emptyDir.sizeLimit

Quantity

Maximum size of the emptyDir volume

Shared Memory Configuration

NIM containers use shared memory for fast model I/O:

storage:
  sharedMemorySizeLimit: 16Gi

storage.sharedMemorySizeLimit

Quantity

Size of the /dev/shm mount (emptyDir with medium: Memory)

Recommended Shared Memory Size

Small models (< 10B params): 8-16Gi
Medium models (10B-70B): 32-64Gi
Large models (> 70B): 64-128Gi

General rule: Allocate 50-70% of total GPU memory

Storage Classes

Fast Local Storage (Recommended)

Use local NVMe SSDs for best performance:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-local-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
spec:
  storage:
    pvc:
      create: true
      storageClass: fast-local-ssd
      size: 100Gi

Network Storage (NFS)

For multi-node deployments with shared cache:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
provisioner: nfs.csi.k8s.io
parameters:
  server: nfs-server.company.com
  share: /exports/nim-cache
volumeBindingMode: Immediate
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
spec:
  multiNode:
    parallelism:
      pipeline: 2
      tensor: 8
  storage:
    pvc:
      create: true
      storageClass: nfs-client
      size: 200Gi
      volumeAccessMode: ReadWriteMany

Cloud Storage Classes

storage:
  pvc:
    create: true
    storageClass: gp3  # AWS EBS gp3
    size: 100Gi
    volumeAccessMode: ReadWriteOnce

Read-Only Storage

Mount storage as read-only to prevent accidental modifications:

storage:
  pvc:
    create: false
    name: shared-model-cache
  readOnly: true

storage.readOnly

boolean

default:"false"

Mount the model storage volume as read-only

Useful when:

Sharing a pre-populated model cache across multiple NIMService instances
Enforcing immutable infrastructure
Using a centralized model repository

Complete Storage Examples

Production with NIMCache

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-3-70b-cache
  namespace: production
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3-70b:1.2.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
      model:
        engine: tensorrt_llm
        tensorParallelism: "4"
  storage:
    pvc:
      create: true
      storageClass: fast-ssd
      size: 200Gi
      volumeAccessMode: ReadWriteOnce
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b
  namespace: production
spec:
  storage:
    nimCache:
      name: llama-3-70b-cache
    sharedMemorySizeLimit: 64Gi
  resources:
    limits:
      nvidia.com/gpu: 4

Multi-Node with NFS

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: deepseek-r1
  namespace: nim-service
spec:
  replicas: 1
  multiNode:
    parallelism:
      pipeline: 2
      tensor: 8
  storage:
    pvc:
      create: true
      storageClass: nfs-client  # RWX-capable storage
      size: 500Gi
      volumeAccessMode: ReadWriteMany
    sharedMemorySizeLimit: 128Gi
  resources:
    limits:
      nvidia.com/gpu: 8

Development with EmptyDir

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-dev
  namespace: development
spec:
  storage:
    emptyDir:
      sizeLimit: 50Gi
    sharedMemorySizeLimit: 8Gi
  resources:
    limits:
      nvidia.com/gpu: 1

Shared Read-Only Cache

# Pre-populate cache
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: shared-llama-cache
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nim/meta/llama-3-8b:1.0.0
      pullSecret: ngc-secret
      authSecret: ngc-api-secret
  storage:
    pvc:
      create: true
      name: shared-cache-pvc
      storageClass: nfs-client
      size: 100Gi
      volumeAccessMode: ReadWriteMany
---
# Service 1 - read-only
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-service-1
spec:
  storage:
    pvc:
      create: false
      name: shared-cache-pvc
    readOnly: true
---
# Service 2 - read-only
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-service-2
spec:
  storage:
    pvc:
      create: false
      name: shared-cache-pvc
    readOnly: true

Storage Best Practices

Storage Recommendations

Use NIMCache for production - Pre-cached models are faster and more reliable
Choose the right storage class - Local SSDs for single-node, NFS for multi-node
Size appropriately - Model size + 20-30% overhead for runtime files
Use ReadWriteMany for multi-node - Required for LeaderWorkerSet deployments
Configure shared memory - Set to 50-70% of total GPU memory
Monitor storage usage - Set up alerts for PVC capacity

Storage Sizing Guide

Model Size	PVC Size	Shared Memory	Storage Type
1B-7B params	50Gi	8-16Gi	RWO, local
8B-13B params	100Gi	16-32Gi	RWO, local
30B-70B params	200Gi	32-64Gi	RWO/RWX
70B+ params	500Gi+	64-128Gi	RWX, network
Multi-node	1Ti+	128Gi+	RWX, NFS

Volume Access Modes

ReadWriteOnce (RWO): Single-node deployments, best performance
ReadWriteMany (RWX): Multi-node deployments, required for LeaderWorkerSet
ReadOnlyMany (ROX): Shared read-only caches across services

Performance Optimization

Use local NVMe SSDs for fastest model loading
Pre-cache models with NIMCache to avoid download delays
Allocate sufficient shared memory for model runtime operations
Use volumeBindingMode: WaitForFirstConsumer for topology-aware scheduling

Volume Mounts

The operator automatically creates volume mounts:

# Automatically created mounts:
volumeMounts:
- name: model-store
  mountPath: /model-store
  subPath: ""  # From pvc.subPath if specified
- name: dshm
  mountPath: /dev/shm

Environment variable is automatically set:

env:
- name: NIM_CACHE_PATH
  value: /model-store

Storage Backends

NFS Server Setup

Example NFS server for shared storage:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv-nim-cache
spec:
  capacity:
    storage: 1Ti
  accessModes:
  - ReadWriteMany
  nfs:
    server: nfs-server.company.com
    path: /exports/nim-cache
  mountOptions:
  - hard
  - nfsvers=4.1
  - rsize=1048576
  - wsize=1048576
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-nim-cache
  namespace: nim-service
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 1Ti
  volumeName: nfs-pv-nim-cache

S3-Compatible Storage (via CSI)

Use S3 CSI driver for object storage:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: s3-csi
provisioner: s3.csi.aws.com
parameters:
  bucket: nim-model-cache
  region: us-west-2
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
spec:
  storage:
    pvc:
      create: true
      storageClass: s3-csi
      size: 1Ti

Troubleshooting

PVC Not Bound

Check PVC status:

kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

Common issues:

No available PersistentVolume matches the claim
StorageClass not found
Insufficient storage capacity

Permission Denied Errors

Error: permission denied: /model-store

Solution: Ensure PVC has correct ownership:

# If using hostPath or local volume
sudo chown -R 1000:2000 /path/to/storage
sudo chmod -R 755 /path/to/storage

Or configure fsGroup in security context (done automatically by operator).

Model Not Found

Error: Model not found in /model-store

Solutions:

Verify NIMCache was created and completed
Check PVC is mounted correctly
Verify storage.nimCache.name matches NIMCache resource

Out of Storage Space

Monitor PVC usage:

kubectl exec -it <pod-name> -n <namespace> -- df -h /model-store

Increase PVC size (if StorageClass supports expansion):

kubectl patch pvc <pvc-name> -n <namespace> -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

Multi-Node Mount Failures

Error: Multi-Attach error for volume "pvc-xxx" Volume is already exclusively attached

Solution: Use ReadWriteMany access mode:

storage:
  pvc:
    volumeAccessMode: ReadWriteMany  # Not ReadWriteOnce

Resource Management - GPU and memory configuration
NIMCache Documentation - Model caching and optimization
Multi-Node Deployment - Storage for distributed workloads
Performance Tuning - Storage optimization tips

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

Overview

Storage Types

NIMCache Volume (Recommended)

PersistentVolumeClaim (PVC)

HostPath

EmptyDir

Shared Memory Configuration

Storage Classes

Fast Local Storage (Recommended)

Network Storage (NFS)

Cloud Storage Classes

Read-Only Storage

Complete Storage Examples

Production with NIMCache

Multi-Node with NFS

Development with EmptyDir

Shared Read-Only Cache

Storage Best Practices

Storage Sizing Guide

Volume Access Modes

Performance Optimization

Volume Mounts

Storage Backends

NFS Server Setup

S3-Compatible Storage (via CSI)

Troubleshooting

PVC Not Bound

Permission Denied Errors

Model Not Found

Out of Storage Space

Multi-Node Mount Failures

Build docs developers (and LLMs) love

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​Overview

​Storage Types

​NIMCache Volume (Recommended)

​PersistentVolumeClaim (PVC)

​HostPath

​EmptyDir

​Shared Memory Configuration

​Storage Classes

​Fast Local Storage (Recommended)

​Network Storage (NFS)

​Cloud Storage Classes

​Read-Only Storage

​Complete Storage Examples

​Production with NIMCache

​Multi-Node with NFS

​Development with EmptyDir

​Shared Read-Only Cache

​Storage Best Practices

​Storage Sizing Guide

​Volume Access Modes

​Performance Optimization

​Volume Mounts

​Storage Backends

​NFS Server Setup

​S3-Compatible Storage (via CSI)

​Troubleshooting

​PVC Not Bound

​Permission Denied Errors

​Model Not Found

​Out of Storage Space

​Multi-Node Mount Failures

​Related Resources

Build docs developers (and LLMs) love

Overview

Storage Types

NIMCache Volume (Recommended)

PersistentVolumeClaim (PVC)

HostPath

EmptyDir

Shared Memory Configuration

Storage Classes

Fast Local Storage (Recommended)

Network Storage (NFS)

Cloud Storage Classes

Read-Only Storage

Complete Storage Examples

Production with NIMCache

Multi-Node with NFS

Development with EmptyDir

Shared Read-Only Cache

Storage Best Practices

Storage Sizing Guide

Volume Access Modes

Performance Optimization

Volume Mounts

Storage Backends

NFS Server Setup

S3-Compatible Storage (via CSI)

Troubleshooting

PVC Not Bound

Permission Denied Errors

Model Not Found

Out of Storage Space

Multi-Node Mount Failures

Related Resources