Skip to main content
The NVIDIA NIM Operator extends Kubernetes with nine custom resource definitions (CRDs) for managing AI inference and NeMo microservices.

NIM resources

Resources for deploying and managing NVIDIA Inference Microservices.

NIMService

Deploys a NIM inference service for serving AI models.

Purpose

The primary resource for deploying optimized inference services. Supports both standalone and KServe deployment platforms, with options for single-node or multi-node (tensor/pipeline parallelism) configurations.
Key capabilities:
  • Multiple inference platform support (standalone, KServe)
  • Auto-scaling with HorizontalPodAutoscaler
  • Multi-node deployments using LeaderWorkerSet
  • Model caching via NIMCache integration
  • Ingress and Gateway API routing
  • Prometheus metrics and ServiceMonitor
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  image:
    repository: nvcr.io/nim/meta/llama-3-8b-instruct
    tag: "1.0.0"
  authSecret: ngc-secret
  storage:
    nimCache:
      name: llama-cache
      profile: "tensorrt-llm-h100-fp16"
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000
See the NIMService API reference for all available fields.

NIMCache

Manages model caching to persistent storage.

Purpose

Automates downloading and caching AI models from NGC, NeMo DataStore, or HuggingFace to persistent volumes. Models are optimized and profiled for specific GPU configurations.
Key capabilities:
  • Multiple model sources (NGC, DataStore, HuggingFace)
  • Profile-based model selection
  • GPU-specific optimizations
  • Proxy and custom certificate support
  • Job-based caching with TTL
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-cache
spec:
  source:
    ngc:
      authSecret: ngc-secret
      modelPuller: nvcr.io/nim/meta/llama-3-8b-instruct:1.0.0
      pullSecret: ngc-secret
      model:
        profiles:
          - "tensorrt-llm-h100-fp16"
  storage:
    pvc:
      create: true
      storageClass: nfs-client
      size: 50Gi
See the NIMCache API reference for all available fields.

NIMBuild

Builds optimized TensorRT-LLM engines from cached models.

Purpose

Creates optimized inference engines from model weights cached by NIMCache. Building custom engines significantly improves inference performance by optimizing for specific GPU hardware.
Key capabilities:
  • TensorRT-LLM engine optimization
  • GPU-specific compilation
  • Profile-based building
  • Integration with NIMCache and NIMService
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMBuild
metadata:
  name: llama-3-8b-engine
spec:
  nimCache:
    name: llama-cache
    profile: "tensorrt-llm-h100-fp16"
  modelName: llama-3-8b-optimized
  image:
    repository: nvcr.io/nvidia/nim-llm
    tag: "1.2.0"
    pullSecrets:
      - ngc-secret
  resources:
    limits:
      nvidia.com/gpu: 1
      memory: 64Gi
See the NIMBuild API reference for all available fields.

NIMPipeline

Orchestrates multiple NIM services as a pipeline.

Purpose

Creates and manages a collection of related NIM services with dependency management. Useful for chaining multiple models or creating complex inference workflows.
Key capabilities:
  • Multiple service orchestration
  • Service dependency management
  • Conditional service enablement
  • Automatic endpoint configuration
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: rag-pipeline
spec:
  services:
    - name: embeddings
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nv-embedqa-e5-v5
          tag: "1.0.0"
        # ... NIMService spec ...
    - name: llm
      enabled: true
      dependencies:
        - name: embeddings
          port: 8000
          envName: EMBEDDING_URL
      spec:
        # ... NIMService spec ...
See the NIMPipeline API reference for all available fields.

NeMo microservice resources

Resources for deploying NVIDIA NeMo microservices.

NemoCustomizer

Deploys the NeMo Customizer service for model fine-tuning.

Purpose

Provides a service for customizing and fine-tuning foundation models using techniques like LoRA. Integrates with datastore, entitystore, and MLFlow for managing training jobs and artifacts.
Key capabilities:
  • Model fine-tuning and customization
  • Training job orchestration (Volcano, Run.ai)
  • MLFlow integration for experiment tracking
  • Weights & Biases support
  • PostgreSQL for metadata storage
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NemoCustomizer
metadata:
  name: customizer
spec:
  image:
    repository: nvcr.io/nvidia/nemo/customizer
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: customizer
    credentials:
      secretName: postgres-secret
      user: customizer
      passwordKey: password
  datastore:
    endpoint: http://datastore.default.svc:3000/v1
  entitystore:
    endpoint: http://entitystore.default.svc:8000
  # ... additional config ...
See the NemoCustomizer API reference for all available fields.

NemoGuardrails

Deploys the NeMo Guardrails service for content filtering.

Purpose

Provides programmable guardrails for LLM applications. Apply safety controls, content filtering, and output validation to inference requests.
Key capabilities:
  • Configurable safety rails
  • Input/output filtering
  • NIM endpoint integration
  • ConfigMap or PVC-based configuration
  • Optional PostgreSQL for conversation history
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NemoGuardrail
metadata:
  name: guardrails
spec:
  image:
    repository: nvcr.io/nvidia/nemo/guardrails
    tag: "24.12"
  nimEndpoint:
    baseURL: http://nim-service.default.svc:8000/v1
  configStore:
    configMap:
      name: guardrails-config
  expose:
    service:
      type: ClusterIP
      port: 8000
See the NemoGuardrails API reference for all available fields.

NemoEvaluator

Deploys the NeMo Evaluator service for model evaluation.

Purpose

Provides automated evaluation of model performance using various benchmarks and metrics. Integrates with Argo Workflows for running evaluation jobs.
Key capabilities:
  • Multiple evaluation frameworks (LM Eval Harness, MT-Bench, BFCL, etc.)
  • Argo Workflows integration
  • Vector database support (Milvus)
  • Datastore and Entitystore integration
  • PostgreSQL for results storage
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEvaluator
metadata:
  name: evaluator
spec:
  image:
    repository: nvcr.io/nvidia/nemo/evaluator
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: evaluator
    # ... credentials ...
  argoWorkflows:
    endpoint: https://argo-server.argo.svc:2746
    serviceAccount: evaluator-sa
  vectorDB:
    endpoint: http://milvus.default.svc:19530
  # ... additional config ...
See the NemoEvaluator API reference for all available fields.

NemoDatastore

Deploys the NeMo DataStore service for dataset management.

Purpose

Provides Git-based dataset and artifact storage using Gitea. Stores training datasets, model artifacts, and supports LFS for large files.
Key capabilities:
  • Git-based repository management
  • Large file storage (LFS) with object storage (S3, MinIO)
  • PostgreSQL backend
  • API access for programmatic operations
  • Integration with Customizer and other services
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NemoDatastore
metadata:
  name: datastore
spec:
  image:
    repository: nvcr.io/nvidia/nemo/datastore
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: gitea
    # ... credentials ...
  objectStoreConfig:
    endpoint: minio.default.svc:9000
    bucketName: nemo-datastore
    region: us-east-1
    ssl: false
    # ... credentials ...
  pvc:
    create: true
    size: 100Gi
See the NemoDatastore API reference for all available fields.

NemoEntitystore

Deploys the NeMo Entitystore service for entity management.

Purpose

Provides storage and retrieval of entity information. Manages metadata about models, datasets, experiments, and other artifacts in the NeMo ecosystem.
Key capabilities:
  • Entity relationship management
  • RESTful API access
  • PostgreSQL backend
  • Integration with DataStore and Customizer
  • Health monitoring
Example:
apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEntitystore
metadata:
  name: entitystore
spec:
  image:
    repository: nvcr.io/nvidia/nemo/entitystore
    tag: "24.12"
  databaseConfig:
    host: postgres.default.svc
    port: 5432
    databaseName: entitystore
    credentials:
      secretName: postgres-secret
      user: entitystore
      passwordKey: password
  datastore:
    endpoint: http://datastore.default.svc:3000
  expose:
    service:
      type: ClusterIP
      port: 8000
See the NemoEntitystore API reference for all available fields.

Common fields

All custom resources share common configuration fields:
image:
  repository: nvcr.io/nim/meta/llama-3-8b-instruct
  tag: "1.0.0"
  pullPolicy: IfNotPresent
  pullSecrets:
    - ngc-secret
resources:
  requests:
    cpu: "4"
    memory: 16Gi
  limits:
    nvidia.com/gpu: 1
nodeSelector:
  nvidia.com/gpu.product: NVIDIA-H100-80GB-HBM3
tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
affinity:
  nodeAffinity:
    # ... node affinity rules
expose:
  service:
    type: ClusterIP
    port: 8000
  router:
    ingress:
      ingressClass: nginx
    gateway:
      httpRoutesEnabled: true
scale:
  enabled: true
  hpa:
    minReplicas: 2
    maxReplicas: 10
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 70

Status fields

All resources report their status with:
  • state - Current state (Pending, Ready, NotReady, Failed)
  • conditions - Detailed condition information
  • availableReplicas - Number of ready replicas
  • Resource-specific fields - Additional status information
status:
  state: Ready
  availableReplicas: 3
  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: "2024-03-01T10:00:00Z"
      reason: DeploymentReady
      message: All pods are ready

Next steps

API reference

Detailed API documentation for each CRD

Examples

Example configurations for each resource type

Build docs developers (and LLMs) love