Overview
NIMCache manages the caching of NVIDIA Inference Microservice (NIM) models. It creates a Kubernetes Job to download and cache model files to persistent storage, making them available for NIMService deployments.
API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NIMCache
Spec Fields
Source Configuration
Model source configuration. Exactly one of ngc, dataStore, or hf must be defined. NVIDIA NGC model source spec.source.ngc.authSecret
Name of secret containing NGC_API_KEY
spec.source.ngc.modelPuller
Container image for pulling the model. Immutable after creation.
spec.source.ngc.pullSecret
Secret for pulling the modelPuller image
Model specification for optimized NIMs. Mutually exclusive with modelEndpoint. spec.source.ngc.model.profiles
Specific model profiles to cache. When provided, other profile selection parameters are ignored.
spec.source.ngc.model.precision
Model quantization precision (e.g., fp16, int8)
spec.source.ngc.model.engine
Backend engine (e.g., tensorrt_llm, vllm)
spec.source.ngc.model.tensorParallelism
Minimum GPUs required for model computations
spec.source.ngc.model.qosProfile
QoS profile type (e.g., throughput, latency)
spec.source.ngc.model.gpus
GPU specifications for caching optimized models spec.source.ngc.model.gpus[].product
GPU product (e.g., h100, a100, l40s)
spec.source.ngc.model.gpus[].ids
Device IDs for specific GPU SKUs
spec.source.ngc.model.lora
Whether the model uses LoRa adapters
spec.source.ngc.model.buildable
Whether the model can be optimized for any GPU
spec.source.ngc.modelEndpoint
Model endpoint for Universal NIMs. Mutually exclusive with model.
NVIDIA NeMo DataStore source spec.source.dataStore.endpoint
HuggingFace endpoint from NeMo DataStore. Pattern: ^https?://.*/v1/hf/?$
spec.source.dataStore.namespace
Namespace within NeMo DataStore
spec.source.dataStore.modelName
Model name. Mutually exclusive with datasetName.
spec.source.dataStore.datasetName
Dataset name. Mutually exclusive with modelName.
spec.source.dataStore.authSecret
Name of secret containing HF_TOKEN (min length: 1)
spec.source.dataStore.modelPuller
Container image for pulling data (min length: 1)
spec.source.dataStore.pullSecret
Secret for pulling the modelPuller image (min length: 1)
spec.source.dataStore.revision
Revision to cache (commit hash, branch, or tag; min length: 1)
HuggingFace Hub source HuggingFace endpoint. Pattern: ^https?://.*$
Namespace within HuggingFace Hub (min length: 1)
Model name. Mutually exclusive with datasetName.
spec.source.hf.datasetName
Dataset name. Mutually exclusive with modelName.
spec.source.hf.authSecret
Name of secret containing HF_TOKEN (min length: 1)
spec.source.hf.modelPuller
Container image for pulling data (min length: 1)
spec.source.hf.pullSecret
Secret for pulling the modelPuller image (min length: 1)
Revision to cache (commit hash, branch, or tag; min length: 1)
Storage Configuration
Target storage for cached models PersistentVolumeClaim for model storage Whether to create a new PVC
PVC name (required if create is false)
spec.storage.pvc.storageClass
StorageClass for PVC creation
spec.storage.pvc.volumeAccessMode
Volume access mode
spec.storage.pvc.annotations
PVC annotations
Deprecated: Use PVC instead. Host path for model storage.
Resource Requirements
Minimum resources required for the caching job Minimum CPU (e.g., 4, 4000m)
Minimum memory (e.g., 8Gi, 8192Mi)
Scheduling
Tolerations for running the caching job
Node selector labels for the caching job. Defaults to {"feature.node.kubernetes.io/pci-10de.present": "true"} if not specified.
Security
User ID for the caching job (default: 1000)
Group ID for the caching job (default: 2000)
RuntimeClass for the caching job
Proxy and Certificates
Deprecated: Use spec.proxy instead. Custom certificate configuration.Name of the ConfigMap containing certificates
spec.certConfig.mountPath
Path where certificates should be mounted
Proxy configuration for the caching job Comma-separated list of hosts to exclude from proxying
Name of ConfigMap containing custom CA certificates
Additional Configuration
Additional environment variables for the caching job
Init containers to run before the caching job Show NIMContainerSpec fields
spec.initContainers[].name
Container name
spec.initContainers[].image
Container image
spec.initContainers[].command
Container command
spec.initContainers[].args
Container arguments
spec.initContainers[].env
Environment variables
spec.initContainers[].workingDir
Working directory
Status Fields
Current state of the NIMCache. Values: NotReady, PVC-Created, Started, Ready, InProgress, Pending, Failed
Name of the PVC used for caching
List of cached model profiles status.profiles[].release
Release version
Profile configuration parameters
Standard Kubernetes conditions. Possible condition types:
NIM_CACHE_JOB_CREATED: Caching job is created
NIM_CACHE_JOB_COMPLETED: Caching job completed
NIM_CACHE_JOB_PENDING: Caching job is pending
NIM_CACHE_PVC_CREATED: PVC is created
NIM_CACHE_RECONCILE_FAILED: Reconciliation failed
Example
NGC Optimized NIM
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMCache
metadata :
name : meta-llama3-8b-cache
namespace : nim-service
spec :
# NGC source for optimized NIM
source :
ngc :
authSecret : ngc-api-key
modelPuller : nvcr.io/nim-model-puller:latest
pullSecret : ngc-secret
model :
profiles :
- llama3-8b-fp16-tp1
precision : fp16
tensorParallelism : "1"
gpus :
- product : a100
# Storage configuration
storage :
pvc :
create : true
storageClass : fast-ssd
size : 100Gi
volumeAccessMode : ReadWriteOnce
# Resource requirements
resources :
cpu : 4
memory : 16Gi
# Scheduling
nodeSelector :
nvidia.com/gpu.product : NVIDIA-A100-SXM4-80GB
tolerations :
- key : nvidia.com/gpu
operator : Exists
effect : NoSchedule
Universal NIM from NGC
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMCache
metadata :
name : universal-nim-cache
namespace : nim-service
spec :
# NGC source for universal NIM
source :
ngc :
authSecret : ngc-api-key
modelPuller : nvcr.io/nim-model-puller:latest
modelEndpoint : https://api.ngc.nvidia.com/v1/models/org/team/model
storage :
pvc :
create : true
size : 50Gi
resources :
cpu : 2
memory : 8Gi
HuggingFace Model
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMCache
metadata :
name : hf-model-cache
namespace : nim-service
spec :
# HuggingFace source
source :
hf :
endpoint : https://huggingface.co
namespace : meta-llama
modelName : Llama-2-7b-hf
authSecret : hf-token
modelPuller : ghcr.io/huggingface/huggingface-cli:latest
pullSecret : ghcr-secret
revision : main
storage :
pvc :
create : true
size : 30Gi
resources :
cpu : 2
memory : 8Gi
# Proxy configuration
proxy :
httpsProxy : http://proxy.example.com:3128
noProxy : localhost,127.0.0.1
certConfigMap : custom-ca-certs