NIM resources
Resources for deploying and managing NVIDIA Inference Microservices.NIMService
Deploys a NIM inference service for serving AI models.Purpose
The primary resource for deploying optimized inference services. Supports both standalone and KServe deployment platforms, with options for single-node or multi-node (tensor/pipeline parallelism) configurations.
- Multiple inference platform support (standalone, KServe)
- Auto-scaling with HorizontalPodAutoscaler
- Multi-node deployments using LeaderWorkerSet
- Model caching via NIMCache integration
- Ingress and Gateway API routing
- Prometheus metrics and ServiceMonitor
See the NIMService API reference for all available fields.
NIMCache
Manages model caching to persistent storage.Purpose
Automates downloading and caching AI models from NGC, NeMo DataStore, or HuggingFace to persistent volumes. Models are optimized and profiled for specific GPU configurations.
- Multiple model sources (NGC, DataStore, HuggingFace)
- Profile-based model selection
- GPU-specific optimizations
- Proxy and custom certificate support
- Job-based caching with TTL
See the NIMCache API reference for all available fields.
NIMBuild
Builds optimized TensorRT-LLM engines from cached models.Purpose
Creates optimized inference engines from model weights cached by NIMCache. Building custom engines significantly improves inference performance by optimizing for specific GPU hardware.
- TensorRT-LLM engine optimization
- GPU-specific compilation
- Profile-based building
- Integration with NIMCache and NIMService
See the NIMBuild API reference for all available fields.
NIMPipeline
Orchestrates multiple NIM services as a pipeline.Purpose
Creates and manages a collection of related NIM services with dependency management. Useful for chaining multiple models or creating complex inference workflows.
- Multiple service orchestration
- Service dependency management
- Conditional service enablement
- Automatic endpoint configuration
See the NIMPipeline API reference for all available fields.
NeMo microservice resources
Resources for deploying NVIDIA NeMo microservices.NemoCustomizer
Deploys the NeMo Customizer service for model fine-tuning.Purpose
Provides a service for customizing and fine-tuning foundation models using techniques like LoRA. Integrates with datastore, entitystore, and MLFlow for managing training jobs and artifacts.
- Model fine-tuning and customization
- Training job orchestration (Volcano, Run.ai)
- MLFlow integration for experiment tracking
- Weights & Biases support
- PostgreSQL for metadata storage
See the NemoCustomizer API reference for all available fields.
NemoGuardrails
Deploys the NeMo Guardrails service for content filtering.Purpose
Provides programmable guardrails for LLM applications. Apply safety controls, content filtering, and output validation to inference requests.
- Configurable safety rails
- Input/output filtering
- NIM endpoint integration
- ConfigMap or PVC-based configuration
- Optional PostgreSQL for conversation history
See the NemoGuardrails API reference for all available fields.
NemoEvaluator
Deploys the NeMo Evaluator service for model evaluation.Purpose
Provides automated evaluation of model performance using various benchmarks and metrics. Integrates with Argo Workflows for running evaluation jobs.
- Multiple evaluation frameworks (LM Eval Harness, MT-Bench, BFCL, etc.)
- Argo Workflows integration
- Vector database support (Milvus)
- Datastore and Entitystore integration
- PostgreSQL for results storage
See the NemoEvaluator API reference for all available fields.
NemoDatastore
Deploys the NeMo DataStore service for dataset management.Purpose
Provides Git-based dataset and artifact storage using Gitea. Stores training datasets, model artifacts, and supports LFS for large files.
- Git-based repository management
- Large file storage (LFS) with object storage (S3, MinIO)
- PostgreSQL backend
- API access for programmatic operations
- Integration with Customizer and other services
See the NemoDatastore API reference for all available fields.
NemoEntitystore
Deploys the NeMo Entitystore service for entity management.Purpose
Provides storage and retrieval of entity information. Manages metadata about models, datasets, experiments, and other artifacts in the NeMo ecosystem.
- Entity relationship management
- RESTful API access
- PostgreSQL backend
- Integration with DataStore and Customizer
- Health monitoring
See the NemoEntitystore API reference for all available fields.
Common fields
All custom resources share common configuration fields:Image configuration
Image configuration
Resource requirements
Resource requirements
Scheduling
Scheduling
Service exposure
Service exposure
Autoscaling
Autoscaling
Status fields
All resources report their status with:- state - Current state (Pending, Ready, NotReady, Failed)
- conditions - Detailed condition information
- availableReplicas - Number of ready replicas
- Resource-specific fields - Additional status information
Next steps
API reference
Detailed API documentation for each CRD
Examples
Example configurations for each resource type