Skip to main content

Overview

NIMCache manages the caching of NVIDIA Inference Microservice (NIM) models. It creates a Kubernetes Job to download and cache model files to persistent storage, making them available for NIMService deployments. API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NIMCache

Spec Fields

Source Configuration

spec.source
NIMSource
required
Model source configuration. Exactly one of ngc, dataStore, or hf must be defined.

Storage Configuration

spec.storage
NIMCacheStorage
required
Target storage for cached models

Resource Requirements

spec.resources
Resources
Minimum resources required for the caching job

Scheduling

spec.tolerations
[]Toleration
Tolerations for running the caching job
spec.nodeSelector
map[string]string
Node selector labels for the caching job. Defaults to {"feature.node.kubernetes.io/pci-10de.present": "true"} if not specified.

Security

spec.userID
int64
User ID for the caching job (default: 1000)
spec.groupID
int64
Group ID for the caching job (default: 2000)
spec.runtimeClassName
string
RuntimeClass for the caching job

Proxy and Certificates

spec.certConfig
CertConfig
Deprecated: Use spec.proxy instead. Custom certificate configuration.
spec.proxy
ProxySpec
Proxy configuration for the caching job

Additional Configuration

spec.env
[]EnvVar
Additional environment variables for the caching job
spec.initContainers
[]NIMContainerSpec
Init containers to run before the caching job

Status Fields

status.state
string
Current state of the NIMCache. Values: NotReady, PVC-Created, Started, Ready, InProgress, Pending, Failed
status.pvc
string
Name of the PVC used for caching
status.profiles
[]NIMProfile
List of cached model profiles
status.conditions
[]Condition
Standard Kubernetes conditions. Possible condition types:
  • NIM_CACHE_JOB_CREATED: Caching job is created
  • NIM_CACHE_JOB_COMPLETED: Caching job completed
  • NIM_CACHE_JOB_PENDING: Caching job is pending
  • NIM_CACHE_PVC_CREATED: PVC is created
  • NIM_CACHE_RECONCILE_FAILED: Reconciliation failed

Example

NGC Optimized NIM

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: meta-llama3-8b-cache
  namespace: nim-service
spec:
  # NGC source for optimized NIM
  source:
    ngc:
      authSecret: ngc-api-key
      modelPuller: nvcr.io/nim-model-puller:latest
      pullSecret: ngc-secret
      model:
        profiles:
          - llama3-8b-fp16-tp1
        precision: fp16
        tensorParallelism: "1"
        gpus:
          - product: a100
  
  # Storage configuration
  storage:
    pvc:
      create: true
      storageClass: fast-ssd
      size: 100Gi
      volumeAccessMode: ReadWriteOnce
  
  # Resource requirements
  resources:
    cpu: 4
    memory: 16Gi
  
  # Scheduling
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB
  
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

Universal NIM from NGC

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: universal-nim-cache
  namespace: nim-service
spec:
  # NGC source for universal NIM
  source:
    ngc:
      authSecret: ngc-api-key
      modelPuller: nvcr.io/nim-model-puller:latest
      modelEndpoint: https://api.ngc.nvidia.com/v1/models/org/team/model
  
  storage:
    pvc:
      create: true
      size: 50Gi
  
  resources:
    cpu: 2
    memory: 8Gi

HuggingFace Model

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: hf-model-cache
  namespace: nim-service
spec:
  # HuggingFace source
  source:
    hf:
      endpoint: https://huggingface.co
      namespace: meta-llama
      modelName: Llama-2-7b-hf
      authSecret: hf-token
      modelPuller: ghcr.io/huggingface/huggingface-cli:latest
      pullSecret: ghcr-secret
      revision: main
  
  storage:
    pvc:
      create: true
      size: 30Gi
  
  resources:
    cpu: 2
    memory: 8Gi
  
  # Proxy configuration
  proxy:
    httpsProxy: http://proxy.example.com:3128
    noProxy: localhost,127.0.0.1
    certConfigMap: custom-ca-certs

Build docs developers (and LLMs) love