NIMPipeline

Overview

NIMPipeline orchestrates multiple NIMService deployments as a unified pipeline. It manages service dependencies and ensures proper startup order for complex AI inference workflows. API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NIMPipeline

Spec Fields

Services Configuration

spec.services

[]NIMServicePipelineSpec

List of NIM services to deploy as part of the pipeline

Show NIMServicePipelineSpec fields

spec.services[].name

string

Name of the service within the pipeline

spec.services[].enabled

boolean

Whether the service is enabled. Use this to conditionally enable/disable services in the pipeline.

spec.services[].spec

NIMServiceSpec

Complete NIMService specification. See NIMService API reference for all available fields.This includes all standard NIMService configuration:

image: Container image configuration
authSecret: NGC API key secret
storage: Model storage configuration
resources: CPU/GPU/memory requirements
expose: Service and networking configuration
scale: Autoscaling settings
All other NIMService spec fields

spec.services[].dependencies

[]ServiceDependency

List of service dependencies that must be ready before this service starts

Show ServiceDependency fields

spec.services[].dependencies[].name

string

required

Name of the dependent service (must match another service’s name in the pipeline)

spec.services[].dependencies[].port

int32

required

Port on which the dependent service is accessible

spec.services[].dependencies[].envName

string

Name of the environment variable to inject with the service endpoint. If not specified, the endpoint is constructed but not automatically injected.

spec.services[].dependencies[].envValue

string

Custom value for the environment variable. If not specified, defaults to the service’s cluster endpoint.

Status Fields

status.conditions

[]Condition

Standard Kubernetes conditions for the NIMPipeline. Possible condition types:

NIM_PIPELINE_READY: Pipeline is ready with all services running
NIM_PIPELINE_FAILED: One or more services in the pipeline have failed

status.states

map[string]string

Map of individual service states, keyed by service name. Each value indicates the current state of that service.Possible states for each service:

Pending: Service is being created
NotReady: Service exists but is not ready
Ready: Service is ready and available
Failed: Service has failed

status.state

string

Overall state of the pipeline. Values:

NotReady: One or more services are not ready
Ready: All enabled services are ready
Failed: One or more services have failed

Example

Simple Pipeline with Dependencies

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: rag-pipeline
  namespace: nim-service
spec:
  services:
    # Embedding service - no dependencies
    - name: embedding-service
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nv-embed-qa
          tag: "1.0.0"
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 10Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 8Gi
        expose:
          service:
            type: ClusterIP
            port: 8000
        replicas: 1
    
    # Reranking service - no dependencies
    - name: rerank-service
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nv-rerankqa-mistral-4b-v3
          tag: "1.0.0"
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 20Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 16Gi
        expose:
          service:
            type: ClusterIP
            port: 8001
        replicas: 1
    
    # LLM service - depends on embedding and reranking
    - name: llm-service
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama3-8b-instruct
          tag: "1.0.0"
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 50Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 32Gi
        expose:
          service:
            type: LoadBalancer
            port: 8002
        replicas: 2
        # Add embedding and reranking endpoints as env vars
        env:
          - name: EMBEDDING_ENDPOINT
            value: "http://embedding-service:8000"
          - name: RERANK_ENDPOINT
            value: "http://rerank-service:8001"
      # Service dependencies
      dependencies:
        - name: embedding-service
          port: 8000
          envName: EMBEDDING_SERVICE_URL
        - name: rerank-service
          port: 8001
          envName: RERANK_SERVICE_URL

Multi-Stage Processing Pipeline

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: multimodal-pipeline
  namespace: nim-service
spec:
  services:
    # Stage 1: Vision encoder
    - name: vision-encoder
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/clip-vit-large
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 15Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 16Gi
        expose:
          service:
            type: ClusterIP
            port: 9000
        replicas: 1
    
    # Stage 2: Feature processor (depends on vision encoder)
    - name: feature-processor
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/custom/feature-processor
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 10Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 8Gi
        expose:
          service:
            type: ClusterIP
            port: 9001
        replicas: 1
      dependencies:
        - name: vision-encoder
          port: 9000
          envName: VISION_ENCODER_URL
    
    # Stage 3: Multimodal LLM (depends on feature processor)
    - name: multimodal-llm
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama3-vision
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 80Gi
        resources:
          requests:
            nvidia.com/gpu: "2"
            memory: 64Gi
        expose:
          service:
            type: LoadBalancer
            port: 8000
          router:
            hostDomainName: example.com
            ingress:
              ingressClass: nginx
        scale:
          enabled: true
          hpa:
            minReplicas: 1
            maxReplicas: 5
            metrics:
              - type: Resource
                resource:
                  name: cpu
                  target:
                    type: Utilization
                    averageUtilization: 70
      dependencies:
        - name: feature-processor
          port: 9001
          envName: FEATURE_PROCESSOR_URL

Conditional Service Enablement

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: optional-services-pipeline
  namespace: nim-service
spec:
  services:
    # Core service - always enabled
    - name: core-llm
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama3-8b-instruct
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 50Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 32Gi
        expose:
          service:
            type: ClusterIP
            port: 8000
        replicas: 1
    
    # Optional guardrails service
    - name: guardrails
      enabled: false  # Can be toggled without affecting core service
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nemo-guardrails
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 5Gi
        resources:
          requests:
            memory: 4Gi
        expose:
          service:
            type: ClusterIP
            port: 8001
        replicas: 1
      dependencies:
        - name: core-llm
          port: 8000

NIM Resources

NeMo Resources

Overview

Spec Fields

Services Configuration

Status Fields

Example

Simple Pipeline with Dependencies

Multi-Stage Processing Pipeline

Conditional Service Enablement

Build docs developers (and LLMs) love

NIM Resources

NeMo Resources

​Overview

​Spec Fields

​Services Configuration

​Status Fields

​Example

​Simple Pipeline with Dependencies

​Multi-Stage Processing Pipeline

​Conditional Service Enablement

Build docs developers (and LLMs) love

Overview

Spec Fields

Services Configuration

Status Fields

Example

Simple Pipeline with Dependencies

Multi-Stage Processing Pipeline

Conditional Service Enablement