Overview
TheNIMPipeline custom resource enables orchestration of multiple NVIDIA NIM services with automatic dependency management. It simplifies deploying complex AI workflows like RAG (Retrieval-Augmented Generation) pipelines, guardrail systems, and multi-model inference chains.
What is NIMPipeline?
NIMPipeline provides:- Declarative multi-service deployment
- Automatic service dependency injection via environment variables
- Coordinated lifecycle management across services
- Service-level enable/disable toggles
- Unified status monitoring for all pipeline services
Basic Concept
A NIMPipeline deploys multiple NIMService resources and automatically configures their dependencies. Each service can reference other services in the pipeline, and the operator injects the appropriate endpoint URLs as environment variables.Basic Example: RAG Pipeline
Service Configuration
List of NIM services to deploy as part of the pipeline.
Service Dependencies
Define dependencies between services to automatically inject endpoint URLs as environment variables.Dependency Example
llm-service will have two environment variables injected:
CONTENT_SAFETY_ENDPOINT=http://content-safety.nim-service.svc.cluster.local:8000JAILBREAK_ENDPOINT=http://jailbreak-detect.nim-service.svc.cluster.local:8000
Advanced Examples
Conditionally Enabled Services
You can selectively enable or disable services in the pipeline:Multi-Port Service Dependencies
Custom Dependency Endpoints
Pipeline Status
The NIMPipeline status provides aggregate information about all services:Status States
NotReady
One or more services are not ready
Ready
All enabled services are ready and operational
Failed
One or more services have failed
Common Pipeline Patterns
RAG (Retrieval-Augmented Generation)
Guardrail Pipeline
Complete Working Example
Here’s a production-ready RAG pipeline with all necessary components:Deployment Workflow
Best Practices
Cache Models First
Always create and verify NIMCache resources before deploying the pipeline to ensure fast service startup.
Use Descriptive Names
Give services clear, descriptive names that reflect their purpose in the pipeline (e.g.,
llm, embedding, reranking).Configure Health Checks
Customize readiness and startup probes for each service based on model size and initialization time.
Plan Resource Allocation
Allocate GPU and memory resources appropriately for each service. Embedding and reranking models typically require fewer resources than LLMs.
Enable Monitoring
Configure metrics and ServiceMonitor for all services to track performance and identify bottlenecks.
Use Selective Enablement
Use the
enabled field to quickly enable/disable services during development and testing.Troubleshooting
Pipeline Not Ready
Check individual service statuses:Dependency Injection Not Working
Verify the environment variables in the pod:Service Communication Errors
Check service DNS resolution:Related Resources
- NIMService Resource - Individual NIM service configuration
- NIMCache Resource - Model caching for pipeline services