NVIDIA NIM Operator
NVIDIA NIM Operator is a Kubernetes Operator designed to facilitate the deployment, management, and scaling of NVIDIA NIM microservices on Kubernetes clusters.What is NVIDIA NIM?
NVIDIA NIM microservices deliver AI foundation models as accelerated inference microservices that are portable across data center, workstation, and cloud, accelerating flexible generative AI development, deployment and time to value.What does the operator do?
The NIM Operator automates the complete lifecycle of NIM microservices on Kubernetes, providing:- Declarative model deployment - Deploy AI models using Kubernetes custom resources
- Automated model caching - Efficiently cache and manage model artifacts
- Multi-model orchestration - Run multiple models with intelligent resource management
- Production-grade scaling - Horizontal autoscaling and multi-node deployments
- Integrated routing - Built-in Ingress, Gateway API, and service mesh support
Key features
Simple deployment
Deploy AI models with a single Kubernetes manifest. The operator handles container orchestration, storage, and networking automatically.
Model caching
Intelligent model artifact caching from NGC, HuggingFace, or NeMo DataStore with support for multiple storage backends.
Auto-scaling
Built-in horizontal pod autoscaling based on GPU metrics, custom metrics, or HTTP request rates.
Multi-node inference
Support for large model inference across multiple nodes using LeaderWorkerSet with tensor and pipeline parallelism.
Production networking
Flexible service exposure via ClusterIP, NodePort, LoadBalancer, Ingress, or Gateway API with HTTPRoute/GRPCRoute support.
Platform integration
First-class support for KServe, OpenShift, VMware TKGS, and standard Kubernetes distributions.
Advanced GPU management
Dynamic Resource Allocation (DRA) support for fine-grained GPU control and multi-instance GPU (MIG) configurations.
Observability
Prometheus ServiceMonitor integration for metrics collection with built-in health probes and readiness checks.
Supported resources
The operator manages several custom resource types:- NIMService - Deploys and manages NIM inference services
- NIMCache - Caches model artifacts from various sources
- NIMPipeline - Orchestrates multi-model inference pipelines
- NIMBuild - Builds custom NIM containers
- NemoGuardrail - Manages NeMo Guardrails for safe AI
- NemoCustomizer - Fine-tunes models with NeMo Customizer
- NemoEvaluator - Evaluates model performance
Get started
Quick start
Deploy your first NIM microservice in minutes
Installation
Install the operator using Helm or kubectl
NIMService
Learn about deploying inference services
NIMCache
Understand model caching and storage
Architecture
The NIM Operator follows the Kubernetes operator pattern:- Custom Resource Definitions (CRDs) - Define the desired state of NIM resources
- Controller - Watches CRDs and reconciles the actual state with desired state
- Admission Webhooks - Validates and mutates resources before persistence
- Resource Management - Creates and manages Deployments, Services, PVCs, and other Kubernetes objects
Requirements
- Kubernetes v1.28 or higher
- NVIDIA GPUs supported by the NIM microservices you plan to deploy
- NVIDIA GPU Operator (for GPU device plugin and drivers)
- Storage class for persistent volume claims (for model caching)
The operator requires cert-manager for admission webhook certificate management when the admission controller is enabled (default).
Next steps
Install the operator
Follow the installation guide to deploy the operator in your cluster.
Deploy your first NIM
Use the quick start guide to deploy a NIM microservice.