Skip to main content

Overview

NIMPipeline orchestrates multiple NIMService deployments as a unified pipeline. It manages service dependencies and ensures proper startup order for complex AI inference workflows. API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NIMPipeline

Spec Fields

Services Configuration

spec.services
[]NIMServicePipelineSpec
List of NIM services to deploy as part of the pipeline

Status Fields

status.conditions
[]Condition
Standard Kubernetes conditions for the NIMPipeline. Possible condition types:
  • NIM_PIPELINE_READY: Pipeline is ready with all services running
  • NIM_PIPELINE_FAILED: One or more services in the pipeline have failed
status.states
map[string]string
Map of individual service states, keyed by service name. Each value indicates the current state of that service.Possible states for each service:
  • Pending: Service is being created
  • NotReady: Service exists but is not ready
  • Ready: Service is ready and available
  • Failed: Service has failed
status.state
string
Overall state of the pipeline. Values:
  • NotReady: One or more services are not ready
  • Ready: All enabled services are ready
  • Failed: One or more services have failed

Example

Simple Pipeline with Dependencies

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: rag-pipeline
  namespace: nim-service
spec:
  services:
    # Embedding service - no dependencies
    - name: embedding-service
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nv-embed-qa
          tag: "1.0.0"
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 10Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 8Gi
        expose:
          service:
            type: ClusterIP
            port: 8000
        replicas: 1
    
    # Reranking service - no dependencies
    - name: rerank-service
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nv-rerankqa-mistral-4b-v3
          tag: "1.0.0"
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 20Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 16Gi
        expose:
          service:
            type: ClusterIP
            port: 8001
        replicas: 1
    
    # LLM service - depends on embedding and reranking
    - name: llm-service
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama3-8b-instruct
          tag: "1.0.0"
          pullPolicy: IfNotPresent
          pullSecrets:
            - ngc-secret
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 50Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 32Gi
        expose:
          service:
            type: LoadBalancer
            port: 8002
        replicas: 2
        # Add embedding and reranking endpoints as env vars
        env:
          - name: EMBEDDING_ENDPOINT
            value: "http://embedding-service:8000"
          - name: RERANK_ENDPOINT
            value: "http://rerank-service:8001"
      # Service dependencies
      dependencies:
        - name: embedding-service
          port: 8000
          envName: EMBEDDING_SERVICE_URL
        - name: rerank-service
          port: 8001
          envName: RERANK_SERVICE_URL

Multi-Stage Processing Pipeline

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: multimodal-pipeline
  namespace: nim-service
spec:
  services:
    # Stage 1: Vision encoder
    - name: vision-encoder
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/nvidia/clip-vit-large
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 15Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 16Gi
        expose:
          service:
            type: ClusterIP
            port: 9000
        replicas: 1
    
    # Stage 2: Feature processor (depends on vision encoder)
    - name: feature-processor
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/custom/feature-processor
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 10Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 8Gi
        expose:
          service:
            type: ClusterIP
            port: 9001
        replicas: 1
      dependencies:
        - name: vision-encoder
          port: 9000
          envName: VISION_ENCODER_URL
    
    # Stage 3: Multimodal LLM (depends on feature processor)
    - name: multimodal-llm
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama3-vision
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 80Gi
        resources:
          requests:
            nvidia.com/gpu: "2"
            memory: 64Gi
        expose:
          service:
            type: LoadBalancer
            port: 8000
          router:
            hostDomainName: example.com
            ingress:
              ingressClass: nginx
        scale:
          enabled: true
          hpa:
            minReplicas: 1
            maxReplicas: 5
            metrics:
              - type: Resource
                resource:
                  name: cpu
                  target:
                    type: Utilization
                    averageUtilization: 70
      dependencies:
        - name: feature-processor
          port: 9001
          envName: FEATURE_PROCESSOR_URL

Conditional Service Enablement

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMPipeline
metadata:
  name: optional-services-pipeline
  namespace: nim-service
spec:
  services:
    # Core service - always enabled
    - name: core-llm
      enabled: true
      spec:
        image:
          repository: nvcr.io/nim/meta/llama3-8b-instruct
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 50Gi
        resources:
          requests:
            nvidia.com/gpu: "1"
            memory: 32Gi
        expose:
          service:
            type: ClusterIP
            port: 8000
        replicas: 1
    
    # Optional guardrails service
    - name: guardrails
      enabled: false  # Can be toggled without affecting core service
      spec:
        image:
          repository: nvcr.io/nim/nvidia/nemo-guardrails
          tag: "1.0.0"
        authSecret: ngc-api-key
        storage:
          pvc:
            create: true
            size: 5Gi
        resources:
          requests:
            memory: 4Gi
        expose:
          service:
            type: ClusterIP
            port: 8001
        replicas: 1
      dependencies:
        - name: core-llm
          port: 8000

Build docs developers (and LLMs) love