Skip to main content

NVIDIA NIM Operator

NVIDIA NIM Operator is a Kubernetes Operator designed to facilitate the deployment, management, and scaling of NVIDIA NIM microservices on Kubernetes clusters.

What is NVIDIA NIM?

NVIDIA NIM microservices deliver AI foundation models as accelerated inference microservices that are portable across data center, workstation, and cloud, accelerating flexible generative AI development, deployment and time to value.

What does the operator do?

The NIM Operator automates the complete lifecycle of NIM microservices on Kubernetes, providing:
  • Declarative model deployment - Deploy AI models using Kubernetes custom resources
  • Automated model caching - Efficiently cache and manage model artifacts
  • Multi-model orchestration - Run multiple models with intelligent resource management
  • Production-grade scaling - Horizontal autoscaling and multi-node deployments
  • Integrated routing - Built-in Ingress, Gateway API, and service mesh support

Key features

Simple deployment

Deploy AI models with a single Kubernetes manifest. The operator handles container orchestration, storage, and networking automatically.

Model caching

Intelligent model artifact caching from NGC, HuggingFace, or NeMo DataStore with support for multiple storage backends.

Auto-scaling

Built-in horizontal pod autoscaling based on GPU metrics, custom metrics, or HTTP request rates.

Multi-node inference

Support for large model inference across multiple nodes using LeaderWorkerSet with tensor and pipeline parallelism.

Production networking

Flexible service exposure via ClusterIP, NodePort, LoadBalancer, Ingress, or Gateway API with HTTPRoute/GRPCRoute support.

Platform integration

First-class support for KServe, OpenShift, VMware TKGS, and standard Kubernetes distributions.

Advanced GPU management

Dynamic Resource Allocation (DRA) support for fine-grained GPU control and multi-instance GPU (MIG) configurations.

Observability

Prometheus ServiceMonitor integration for metrics collection with built-in health probes and readiness checks.

Supported resources

The operator manages several custom resource types:
  • NIMService - Deploys and manages NIM inference services
  • NIMCache - Caches model artifacts from various sources
  • NIMPipeline - Orchestrates multi-model inference pipelines
  • NIMBuild - Builds custom NIM containers
  • NemoGuardrail - Manages NeMo Guardrails for safe AI
  • NemoCustomizer - Fine-tunes models with NeMo Customizer
  • NemoEvaluator - Evaluates model performance

Get started

Quick start

Deploy your first NIM microservice in minutes

Installation

Install the operator using Helm or kubectl

NIMService

Learn about deploying inference services

NIMCache

Understand model caching and storage

Architecture

The NIM Operator follows the Kubernetes operator pattern:
  1. Custom Resource Definitions (CRDs) - Define the desired state of NIM resources
  2. Controller - Watches CRDs and reconciles the actual state with desired state
  3. Admission Webhooks - Validates and mutates resources before persistence
  4. Resource Management - Creates and manages Deployments, Services, PVCs, and other Kubernetes objects

Requirements

  • Kubernetes v1.28 or higher
  • NVIDIA GPUs supported by the NIM microservices you plan to deploy
  • NVIDIA GPU Operator (for GPU device plugin and drivers)
  • Storage class for persistent volume claims (for model caching)
The operator requires cert-manager for admission webhook certificate management when the admission controller is enabled (default).

Next steps

1

Install the operator

Follow the installation guide to deploy the operator in your cluster.
2

Deploy your first NIM

Use the quick start guide to deploy a NIM microservice.
3

Explore advanced features

Learn about autoscaling, multi-node deployments, and pipeline orchestration.

Build docs developers (and LLMs) love