Custom resources

The NVIDIA NIM Operator extends Kubernetes with nine custom resource definitions (CRDs) for managing AI inference and NeMo microservices.

NIM resources

Resources for deploying and managing NVIDIA Inference Microservices.

NIMService

Deploys a NIM inference service for serving AI models.

Purpose

The primary resource for deploying optimized inference services. Supports both standalone and KServe deployment platforms, with options for single-node or multi-node (tensor/pipeline parallelism) configurations.

Key capabilities:

Multiple inference platform support (standalone, KServe)
Auto-scaling with HorizontalPodAutoscaler
Multi-node deployments using LeaderWorkerSet
Model caching via NIMCache integration
Ingress and Gateway API routing
Prometheus metrics and ServiceMonitor

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  image:
    repository: nvcr.io/nim/meta/llama-3-8b-instruct
    tag: "1.0.0"
  authSecret: ngc-secret
  storage:
    nimCache:
      name: llama-cache
      profile: "tensorrt-llm-h100-fp16"
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000

See the NIMService API reference for all available fields.

NIMCache

Manages model caching to persistent storage.

Purpose

Automates downloading and caching AI models from NGC, NeMo DataStore, or HuggingFace to persistent volumes. Models are optimized and profiled for specific GPU configurations.

Key capabilities:

Multiple model sources (NGC, DataStore, HuggingFace)
Profile-based model selection
GPU-specific optimizations
Proxy and custom certificate support
Job-based caching with TTL

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-cache
spec:
  source:
    ngc:
      authSecret: ngc-secret
      modelPuller: nvcr.io/nim/meta/llama-3-8b-instruct:1.0.0
      pullSecret: ngc-secret
      model:
        profiles:
          - "tensorrt-llm-h100-fp16"
  storage:
    pvc:
      create: true
      storageClass: nfs-client
      size: 50Gi

See the NIMCache API reference for all available fields.

NIMBuild

Builds optimized TensorRT-LLM engines from cached models.

Purpose

Creates optimized inference engines from model weights cached by NIMCache. Building custom engines significantly improves inference performance by optimizing for specific GPU hardware.

Key capabilities:

TensorRT-LLM engine optimization
GPU-specific compilation
Profile-based building
Integration with NIMCache and NIMService

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMBuild
metadata:
  name: llama-3-8b-engine
spec:
  nimCache:
    name: llama-cache
    profile: "tensorrt-llm-h100-fp16"
  modelName: llama-3-8b-optimized
  image:
    repository: nvcr.io/nvidia/nim-llm
    tag: "1.2.0"
    pullSecrets:
      - ngc-secret
  resources:
    limits:
      nvidia.com/gpu: 1
      memory: 64Gi

See the NIMBuild API reference for all available fields.

NIMPipeline

Orchestrates multiple NIM services as a pipeline.

Purpose

Creates and manages a collection of related NIM services with dependency management. Useful for chaining multiple models or creating complex inference workflows.

Key capabilities:

Multiple service orchestration
Service dependency management
Conditional service enablement
Automatic endpoint configuration

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: rag-pipeline
spec:
  services:
    - name: embeddings
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nv-embedqa-e5-v5
          tag: "1.0.0"
        # ... NIMService spec ...
    - name: llm
      enabled: true
      dependencies:
        - name: embeddings
          port: 8000
          envName: EMBEDDING_URL
      spec:
        # ... NIMService spec ...

See the NIMPipeline API reference for all available fields.

NeMo microservice resources

Resources for deploying NVIDIA NeMo microservices.

NemoCustomizer

Deploys the NeMo Customizer service for model fine-tuning.

Purpose

Provides a service for customizing and fine-tuning foundation models using techniques like LoRA. Integrates with datastore, entitystore, and MLFlow for managing training jobs and artifacts.

Key capabilities:

Model fine-tuning and customization
Training job orchestration (Volcano, Run.ai)
MLFlow integration for experiment tracking
Weights & Biases support
PostgreSQL for metadata storage

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoCustomizer
metadata:
  name: customizer
spec:
  image:
    repository: nvcr.io/nvidia/nemo/customizer
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: customizer
    credentials:
      secretName: postgres-secret
      user: customizer
      passwordKey: password
  datastore:
    endpoint: http://datastore.default.svc:3000/v1
  entitystore:
    endpoint: http://entitystore.default.svc:8000
  # ... additional config ...

See the NemoCustomizer API reference for all available fields.

NemoGuardrails

Deploys the NeMo Guardrails service for content filtering.

Purpose

Provides programmable guardrails for LLM applications. Apply safety controls, content filtering, and output validation to inference requests.

Key capabilities:

Configurable safety rails
Input/output filtering
NIM endpoint integration
ConfigMap or PVC-based configuration
Optional PostgreSQL for conversation history

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoGuardrail
metadata:
  name: guardrails
spec:
  image:
    repository: nvcr.io/nvidia/nemo/guardrails
    tag: "24.12"
  nimEndpoint:
    baseURL: http://nim-service.default.svc:8000/v1
  configStore:
    configMap:
      name: guardrails-config
  expose:
    service:
      type: ClusterIP
      port: 8000

See the NemoGuardrails API reference for all available fields.

NemoEvaluator

Deploys the NeMo Evaluator service for model evaluation.

Purpose

Provides automated evaluation of model performance using various benchmarks and metrics. Integrates with Argo Workflows for running evaluation jobs.

Key capabilities:

Multiple evaluation frameworks (LM Eval Harness, MT-Bench, BFCL, etc.)
Argo Workflows integration
Vector database support (Milvus)
Datastore and Entitystore integration
PostgreSQL for results storage

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEvaluator
metadata:
  name: evaluator
spec:
  image:
    repository: nvcr.io/nvidia/nemo/evaluator
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: evaluator
    # ... credentials ...
  argoWorkflows:
    endpoint: https://argo-server.argo.svc:2746
    serviceAccount: evaluator-sa
  vectorDB:
    endpoint: http://milvus.default.svc:19530
  # ... additional config ...

See the NemoEvaluator API reference for all available fields.

NemoDatastore

Deploys the NeMo DataStore service for dataset management.

Purpose

Provides Git-based dataset and artifact storage using Gitea. Stores training datasets, model artifacts, and supports LFS for large files.

Key capabilities:

Git-based repository management
Large file storage (LFS) with object storage (S3, MinIO)
PostgreSQL backend
API access for programmatic operations
Integration with Customizer and other services

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoDatastore
metadata:
  name: datastore
spec:
  image:
    repository: nvcr.io/nvidia/nemo/datastore
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: gitea
    # ... credentials ...
  objectStoreConfig:
    endpoint: minio.default.svc:9000
    bucketName: nemo-datastore
    region: us-east-1
    ssl: false
    # ... credentials ...
  pvc:
    create: true
    size: 100Gi

See the NemoDatastore API reference for all available fields.

NemoEntitystore

Deploys the NeMo Entitystore service for entity management.

Purpose

Provides storage and retrieval of entity information. Manages metadata about models, datasets, experiments, and other artifacts in the NeMo ecosystem.

Key capabilities:

Entity relationship management
RESTful API access
PostgreSQL backend
Integration with DataStore and Customizer
Health monitoring

Example:

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEntitystore
metadata:
  name: entitystore
spec:
  image:
    repository: nvcr.io/nvidia/nemo/entitystore
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: entitystore
    credentials:
      secretName: postgres-secret
      user: entitystore
      passwordKey: password
  datastore:
    endpoint: http://datastore.default.svc:3000
  expose:
    service:
      type: ClusterIP
      port: 8000

See the NemoEntitystore API reference for all available fields.

Common fields

All custom resources share common configuration fields:

Image configuration

image:
  repository: nvcr.io/nim/meta/llama-3-8b-instruct
  tag: "1.0.0"
  pullPolicy: IfNotPresent
  pullSecrets:
    - ngc-secret

Resource requirements

resources:
  requests:
    cpu: "4"
    memory: 16Gi
  limits:
    nvidia.com/gpu: 1

Scheduling

nodeSelector:
  nvidia.com/gpu.product: NVIDIA-H100-80GB-HBM3
tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
affinity:
  nodeAffinity:
    # ... node affinity rules

Service exposure

expose:
  service:
    type: ClusterIP
    port: 8000
  router:
    ingress:
      ingressClass: nginx
    gateway:
      httpRoutesEnabled: true

Autoscaling

scale:
  enabled: true
  hpa:
    minReplicas: 2
    maxReplicas: 10
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 70

Status fields

All resources report their status with:

state - Current state (Pending, Ready, NotReady, Failed)
conditions - Detailed condition information
availableReplicas - Number of ready replicas
Resource-specific fields - Additional status information

status:
  state: Ready
  availableReplicas: 3
  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: "2024-03-01T10:00:00Z"
      reason: DeploymentReady
      message: All pods are ready

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

NIM resources

NIMService

Purpose

NIMCache

Purpose

NIMBuild

Purpose

NIMPipeline

Purpose

NeMo microservice resources

NemoCustomizer

Purpose

NemoGuardrails

Purpose

NemoEvaluator

Purpose

NemoDatastore

Purpose

NemoEntitystore

Purpose

Common fields

Status fields

Next steps

API reference

Examples

Build docs developers (and LLMs) love

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​NIM resources

​NIMService

Purpose

​NIMCache

Purpose

​NIMBuild

Purpose

​NIMPipeline

Purpose

​NeMo microservice resources

​NemoCustomizer

Purpose

​NemoGuardrails

Purpose

​NemoEvaluator

Purpose

​NemoDatastore

Purpose

​NemoEntitystore

Purpose

​Common fields

​Status fields

​Next steps

API reference

Examples

Build docs developers (and LLMs) love

NIM resources

NIMService

NIMCache

NIMBuild

NIMPipeline

NeMo microservice resources

NemoCustomizer

NemoGuardrails

NemoEvaluator

NemoDatastore

NemoEntitystore

Common fields

Status fields

Next steps