NVIDIA NIM Operator

NVIDIA NIM Operator is a Kubernetes Operator designed to facilitate the deployment, management, and scaling of NVIDIA NIM microservices on Kubernetes clusters.

What is NVIDIA NIM?

NVIDIA NIM microservices deliver AI foundation models as accelerated inference microservices that are portable across data center, workstation, and cloud, accelerating flexible generative AI development, deployment and time to value.

What does the operator do?

The NIM Operator automates the complete lifecycle of NIM microservices on Kubernetes, providing:

Declarative model deployment - Deploy AI models using Kubernetes custom resources
Automated model caching - Efficiently cache and manage model artifacts
Multi-model orchestration - Run multiple models with intelligent resource management
Production-grade scaling - Horizontal autoscaling and multi-node deployments
Integrated routing - Built-in Ingress, Gateway API, and service mesh support

Key features

Simple deployment

Deploy AI models with a single Kubernetes manifest. The operator handles container orchestration, storage, and networking automatically.

Model caching

Intelligent model artifact caching from NGC, HuggingFace, or NeMo DataStore with support for multiple storage backends.

Auto-scaling

Built-in horizontal pod autoscaling based on GPU metrics, custom metrics, or HTTP request rates.

Multi-node inference

Support for large model inference across multiple nodes using LeaderWorkerSet with tensor and pipeline parallelism.

Production networking

Flexible service exposure via ClusterIP, NodePort, LoadBalancer, Ingress, or Gateway API with HTTPRoute/GRPCRoute support.

Platform integration

First-class support for KServe, OpenShift, VMware TKGS, and standard Kubernetes distributions.

Advanced GPU management

Dynamic Resource Allocation (DRA) support for fine-grained GPU control and multi-instance GPU (MIG) configurations.

Observability

Prometheus ServiceMonitor integration for metrics collection with built-in health probes and readiness checks.

Supported resources

The operator manages several custom resource types:

NIMService - Deploys and manages NIM inference services
NIMCache - Caches model artifacts from various sources
NIMPipeline - Orchestrates multi-model inference pipelines
NIMBuild - Builds custom NIM containers
NemoGuardrail - Manages NeMo Guardrails for safe AI
NemoCustomizer - Fine-tunes models with NeMo Customizer
NemoEvaluator - Evaluates model performance

Get started

Quick start

Deploy your first NIM microservice in minutes

Installation

Install the operator using Helm or kubectl

NIMService

Learn about deploying inference services

NIMCache

Understand model caching and storage

Architecture

The NIM Operator follows the Kubernetes operator pattern:

Custom Resource Definitions (CRDs) - Define the desired state of NIM resources
Controller - Watches CRDs and reconciles the actual state with desired state
Admission Webhooks - Validates and mutates resources before persistence
Resource Management - Creates and manages Deployments, Services, PVCs, and other Kubernetes objects

Requirements

Kubernetes v1.28 or higher
NVIDIA GPUs supported by the NIM microservices you plan to deploy
NVIDIA GPU Operator (for GPU device plugin and drivers)
Storage class for persistent volume claims (for model caching)

The operator requires cert-manager for admission webhook certificate management when the admission controller is enabled (default).

Next steps

Install the operator

Follow the installation guide to deploy the operator in your cluster.

Deploy your first NIM

Use the quick start guide to deploy a NIM microservice.

Explore advanced features

Learn about autoscaling, multi-node deployments, and pipeline orchestration.

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

NVIDIA NIM Operator

NVIDIA NIM Operator

What is NVIDIA NIM?

What does the operator do?

Key features

Simple deployment

Model caching

Auto-scaling

Multi-node inference

Production networking

Platform integration

Advanced GPU management

Observability

Supported resources

Get started

Quick start

Installation

NIMService

NIMCache

Architecture

Requirements

Next steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​NVIDIA NIM Operator

​What is NVIDIA NIM?

​What does the operator do?

​Key features

Simple deployment

Model caching

Auto-scaling

Multi-node inference

Production networking

Platform integration

Advanced GPU management

Observability

​Supported resources

​Get started

Quick start

Installation

NIMService

NIMCache

​Architecture

​Requirements

​Next steps

Build docs developers (and LLMs) love

NVIDIA NIM Operator

What is NVIDIA NIM?

What does the operator do?

Key features

Supported resources

Get started

Architecture

Requirements

Next steps