NIMCache

Overview

NIMCache manages the caching of NVIDIA Inference Microservice (NIM) models. It creates a Kubernetes Job to download and cache model files to persistent storage, making them available for NIMService deployments. API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NIMCache

Spec Fields

Source Configuration

spec.source

NIMSource

required

Model source configuration. Exactly one of ngc, dataStore, or hf must be defined.

Show Source types

spec.source.ngc

NGCSource

NVIDIA NGC model source

Show NGC fields

spec.source.ngc.authSecret

string

required

Name of secret containing NGC_API_KEY

spec.source.ngc.modelPuller

string

required

Container image for pulling the model. Immutable after creation.

spec.source.ngc.pullSecret

string

Secret for pulling the modelPuller image

spec.source.ngc.model

ModelSpec

Model specification for optimized NIMs. Mutually exclusive with modelEndpoint.

Show ModelSpec fields

spec.source.ngc.model.profiles

[]string

Specific model profiles to cache. When provided, other profile selection parameters are ignored.

spec.source.ngc.model.precision

string

Model quantization precision (e.g., fp16, int8)

spec.source.ngc.model.engine

string

Backend engine (e.g., tensorrt_llm, vllm)

spec.source.ngc.model.tensorParallelism

string

Minimum GPUs required for model computations

spec.source.ngc.model.qosProfile

string

QoS profile type (e.g., throughput, latency)

spec.source.ngc.model.gpus

[]GPUSpec

GPU specifications for caching optimized models

Show GPUSpec fields

spec.source.ngc.model.gpus[].product

string

GPU product (e.g., h100, a100, l40s)

spec.source.ngc.model.gpus[].ids

[]string

Device IDs for specific GPU SKUs

spec.source.ngc.model.lora

boolean

Whether the model uses LoRa adapters

spec.source.ngc.model.buildable

boolean

Whether the model can be optimized for any GPU

spec.source.ngc.modelEndpoint

string

Model endpoint for Universal NIMs. Mutually exclusive with model.

spec.source.dataStore

NemoDataStoreSource

NVIDIA NeMo DataStore source

Show DataStore fields

spec.source.dataStore.endpoint

string

required

HuggingFace endpoint from NeMo DataStore. Pattern: ^https?://.*/v1/hf/?$

spec.source.dataStore.namespace

string

default:"default"

Namespace within NeMo DataStore

spec.source.dataStore.modelName

string

Model name. Mutually exclusive with datasetName.

spec.source.dataStore.datasetName

string

Dataset name. Mutually exclusive with modelName.

spec.source.dataStore.authSecret

string

required

Name of secret containing HF_TOKEN (min length: 1)

spec.source.dataStore.modelPuller

string

required

Container image for pulling data (min length: 1)

spec.source.dataStore.pullSecret

string

required

Secret for pulling the modelPuller image (min length: 1)

spec.source.dataStore.revision

string

Revision to cache (commit hash, branch, or tag; min length: 1)

spec.source.hf

HuggingFaceHubSource

HuggingFace Hub source

Show HuggingFace fields

spec.source.hf.endpoint

string

required

HuggingFace endpoint. Pattern: ^https?://.*$

spec.source.hf.namespace

string

required

Namespace within HuggingFace Hub (min length: 1)

spec.source.hf.modelName

string

Model name. Mutually exclusive with datasetName.

spec.source.hf.datasetName

string

Dataset name. Mutually exclusive with modelName.

spec.source.hf.authSecret

string

required

Name of secret containing HF_TOKEN (min length: 1)

spec.source.hf.modelPuller

string

required

Container image for pulling data (min length: 1)

spec.source.hf.pullSecret

string

required

Secret for pulling the modelPuller image (min length: 1)

spec.source.hf.revision

string

Revision to cache (commit hash, branch, or tag; min length: 1)

Storage Configuration

spec.storage

NIMCacheStorage

required

Target storage for cached models

Show Storage fields

spec.storage.pvc

PersistentVolumeClaim

PersistentVolumeClaim for model storage

Show PVC fields

spec.storage.pvc.create

boolean

Whether to create a new PVC

spec.storage.pvc.name

string

PVC name (required if create is false)

spec.storage.pvc.storageClass

string

StorageClass for PVC creation

spec.storage.pvc.size

string

PVC size (e.g., 100Gi)

spec.storage.pvc.volumeAccessMode

string

Volume access mode

spec.storage.pvc.subPath

string

SubPath within the PVC

spec.storage.pvc.annotations

map[string]string

PVC annotations

spec.storage.hostPath

string

Deprecated: Use PVC instead. Host path for model storage.

Resource Requirements

spec.resources

Resources

Minimum resources required for the caching job

Show Resource fields

spec.resources.cpu

Quantity

Minimum CPU (e.g., 4, 4000m)

spec.resources.memory

Quantity

Minimum memory (e.g., 8Gi, 8192Mi)

Scheduling

spec.tolerations

[]Toleration

Tolerations for running the caching job

spec.nodeSelector

map[string]string

Node selector labels for the caching job. Defaults to {"feature.node.kubernetes.io/pci-10de.present": "true"} if not specified.

Security

spec.userID

int64

User ID for the caching job (default: 1000)

spec.groupID

int64

Group ID for the caching job (default: 2000)

spec.runtimeClassName

string

RuntimeClass for the caching job

Proxy and Certificates

spec.certConfig

CertConfig

Deprecated: Use spec.proxy instead. Custom certificate configuration.

Show CertConfig fields

spec.certConfig.name

string

required

Name of the ConfigMap containing certificates

spec.certConfig.mountPath

string

required

Path where certificates should be mounted

spec.proxy

ProxySpec

Proxy configuration for the caching job

Show ProxySpec fields

spec.proxy.httpProxy

string

HTTP proxy URL

spec.proxy.httpsProxy

string

HTTPS proxy URL

spec.proxy.noProxy

string

Comma-separated list of hosts to exclude from proxying

spec.proxy.certConfigMap

string

Name of ConfigMap containing custom CA certificates

Additional Configuration

spec.env

[]EnvVar

Additional environment variables for the caching job

spec.initContainers

[]NIMContainerSpec

Init containers to run before the caching job

Show NIMContainerSpec fields

spec.initContainers[].name

string

required

Container name

spec.initContainers[].image

Image

required

Container image

spec.initContainers[].command

[]string

Container command

spec.initContainers[].args

[]string

Container arguments

spec.initContainers[].env

[]EnvVar

Environment variables

spec.initContainers[].workingDir

string

Working directory

Status Fields

status.state

string

Current state of the NIMCache. Values: NotReady, PVC-Created, Started, Ready, InProgress, Pending, Failed

status.pvc

string

Name of the PVC used for caching

status.profiles

[]NIMProfile

List of cached model profiles

Show NIMProfile fields

status.profiles[].name

string

Profile name

status.profiles[].model

string

Model name

status.profiles[].release

string

Release version

status.profiles[].config

map[string]string

Profile configuration parameters

status.conditions

[]Condition

Standard Kubernetes conditions. Possible condition types:

NIM_CACHE_JOB_CREATED: Caching job is created
NIM_CACHE_JOB_COMPLETED: Caching job completed
NIM_CACHE_JOB_PENDING: Caching job is pending
NIM_CACHE_PVC_CREATED: PVC is created
NIM_CACHE_RECONCILE_FAILED: Reconciliation failed

Example

NGC Optimized NIM

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: meta-llama3-8b-cache
  namespace: nim-service
spec:
  # NGC source for optimized NIM
  source:
    ngc:
      authSecret: ngc-api-key
      modelPuller: nvcr.io/nim-model-puller:latest
      pullSecret: ngc-secret
      model:
        profiles:
          - llama3-8b-fp16-tp1
        precision: fp16
        tensorParallelism: "1"
        gpus:
          - product: a100
  
  # Storage configuration
  storage:
    pvc:
      create: true
      storageClass: fast-ssd
      size: 100Gi
      volumeAccessMode: ReadWriteOnce
  
  # Resource requirements
  resources:
    cpu: 4
    memory: 16Gi
  
  # Scheduling
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB
  
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

Universal NIM from NGC

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: universal-nim-cache
  namespace: nim-service
spec:
  # NGC source for universal NIM
  source:
    ngc:
      authSecret: ngc-api-key
      modelPuller: nvcr.io/nim-model-puller:latest
      modelEndpoint: https://api.ngc.nvidia.com/v1/models/org/team/model
  
  storage:
    pvc:
      create: true
      size: 50Gi
  
  resources:
    cpu: 2
    memory: 8Gi

HuggingFace Model

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: hf-model-cache
  namespace: nim-service
spec:
  # HuggingFace source
  source:
    hf:
      endpoint: https://huggingface.co
      namespace: meta-llama
      modelName: Llama-2-7b-hf
      authSecret: hf-token
      modelPuller: ghcr.io/huggingface/huggingface-cli:latest
      pullSecret: ghcr-secret
      revision: main
  
  storage:
    pvc:
      create: true
      size: 30Gi
  
  resources:
    cpu: 2
    memory: 8Gi
  
  # Proxy configuration
  proxy:
    httpsProxy: http://proxy.example.com:3128
    noProxy: localhost,127.0.0.1
    certConfigMap: custom-ca-certs

NIM Resources

NeMo Resources

Overview

Spec Fields

Source Configuration

Storage Configuration

Resource Requirements

Scheduling

Security

Proxy and Certificates

Additional Configuration

Status Fields

Example

NGC Optimized NIM

Universal NIM from NGC

HuggingFace Model

Build docs developers (and LLMs) love

NIM Resources

NeMo Resources

​Overview

​Spec Fields

​Source Configuration

​Storage Configuration

​Resource Requirements

​Scheduling

​Security

​Proxy and Certificates

​Additional Configuration

​Status Fields

​Example

​NGC Optimized NIM

​Universal NIM from NGC

​HuggingFace Model

Build docs developers (and LLMs) love

Overview

Spec Fields

Source Configuration

Storage Configuration

Resource Requirements

Scheduling

Security

Proxy and Certificates

Additional Configuration

Status Fields

Example

NGC Optimized NIM

Universal NIM from NGC

HuggingFace Model