Skip to main content

Overview

The NVIDIA NIM Operator provides flexible networking options to expose your NIM services. Choose from ClusterIP for internal access, LoadBalancer for external traffic, or use Ingress/Gateway API for advanced routing.

Service Configuration

Service Types

Configure the Kubernetes Service type for your NIMService:
Internal cluster access only (default):
expose:
  service:
    type: ClusterIP
    port: 8000

Service Ports

NIM services support multiple ports for different protocols:
expose:
  service:
    type: ClusterIP
    port: 8000          # HTTP API port (default: 8000)
    grpcPort: 8001      # gRPC port (optional)
    metricsPort: 8002   # Metrics port (optional)
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "8002"
expose.service.port
integer
default:"8000"
Main HTTP API serving port (1-65535)
expose.service.grpcPort
integer
gRPC serving port for Triton-based NIMs (1-65535)
expose.service.metricsPort
integer
Separate metrics endpoint port for Triton Inference Server (1-65535)
expose.service.name
string
Override the default service name
expose.service.annotations
object
Custom annotations for the Service resource

Named Ports

The operator creates named ports for service discovery:
  • api: HTTP API port (default: 8000)
  • grpc: gRPC port (if configured)
  • metrics: Metrics port (if configured)

Ingress Configuration

Use the router field for modern Ingress configuration:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
  namespace: nim-service
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
    router:
      hostDomainName: example.com
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-prod
      ingress:
        ingressClass: nginx
        tlsSecretName: llama-tls-cert
This creates an Ingress with hostname: llama-3-8b.nim-service.example.com
router.hostDomainName
string
required
Base domain name. The full hostname is constructed as <service-name>.<namespace>.<hostDomainName>Pattern: ^(([a-z0-9][a-z0-9\-]*[a-z0-9])|[a-z0-9]+\.)*([a-z]+|xn\-\-[a-z0-9]+)\.?$
router.ingress.ingressClass
string
required
Ingress class to use (e.g., nginx, traefik, istio)
router.ingress.tlsSecretName
string
Name of the TLS secret for HTTPS
router.annotations
object
Annotations for the Ingress resource

Legacy Ingress Configuration

The expose.ingress field is deprecated. Use expose.router.ingress instead.
expose:
  ingress:
    enabled: true
    spec:
      ingressClassName: nginx
      rules:
      - host: llama.example.com
        http:
          paths:
          - path: /v1/chat/completions
            pathType: Prefix
            backend:
              service:
                name: llama-3-8b
                port:
                  number: 8000

Ingress Examples

expose:
  router:
    hostDomainName: ai.company.com
    ingress:
      ingressClass: nginx
      tlsSecretName: nim-tls
    annotations:
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/proxy-body-size: 100m
      cert-manager.io/cluster-issuer: letsencrypt-prod

Gateway API Configuration

HTTP Routes

Use Kubernetes Gateway API for advanced routing:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b
  namespace: nim-service
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
    router:
      hostDomainName: ai.company.com
      gateway:
        namespace: gateway-system
        name: ai-gateway
        httpRoutesEnabled: true
        grpcRoutesEnabled: false
      annotations:
        custom.annotation/team: ml-platform
This creates an HTTPRoute with:
  • Hostname: llama-3-70b.nim-service.ai.company.com
  • Parent Gateway: ai-gateway in gateway-system namespace
  • Path match: / (prefix)
router.gateway.namespace
string
required
Namespace of the Gateway resource
router.gateway.name
string
required
Name of the Gateway resource
router.gateway.httpRoutesEnabled
boolean
default:"true"
Enable HTTPRoute creation
router.gateway.grpcRoutesEnabled
boolean
default:"false"
Enable GRPCRoute creation (NIMPipeline only)
router.gateway.backendRef
object
Custom backend reference to override default service backend

gRPC Routes

For Triton-based NIMs with gRPC support:
expose:
  service:
    type: ClusterIP
    port: 8000
    grpcPort: 8001
  router:
    hostDomainName: ai.company.com
    gateway:
      namespace: gateway-system
      name: ai-gateway
      httpRoutesEnabled: true
      grpcRoutesEnabled: true
gRPC routes require grpcPort to be configured in the service.

Custom Backend Reference

Route to a different backend (e.g., for canary deployments):
router:
  gateway:
    namespace: gateway-system
    name: ai-gateway
    backendRef:
      group: ""
      kind: Service
      name: llama-canary
      namespace: nim-service
      port: 8000
      weight: 100

Endpoint Picker Plugin (EPP) Configuration

EPP is only supported for standalone inference platform on NIMService.
Configure endpoint picker for intelligent routing:
expose:
  router:
    hostDomainName: ai.company.com
    gateway:
      namespace: gateway-system
      name: ai-gateway
    eppConfig:
      containerSpec:
        name: epp
        image:
          repository: nvcr.io/nvidia/cloud-native/endpoint-picker
          tag: "0.1.0"
          pullPolicy: IfNotPresent
        env:
        - name: LOG_LEVEL
          value: debug
      ports:
      - name: http
        containerPort: 8080
        protocol: TCP
      readinessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10
      config:
        apiVersion: config.apix.x-k8s.io/v1alpha1
        kind: EndpointPickerConfig
        routingStrategy: leastLoaded
        backends:
        - weight: 100
eppConfig.containerSpec
object
required
Container specification for EPP sidecar
eppConfig.config
object
Inline EndpointPickerConfig. Mutually exclusive with configMapRef.
eppConfig.configMapRef
object
Reference to ConfigMap containing EndpointPickerConfig. Mutually exclusive with config.

Router Validation

Ingress and Gateway configurations are mutually exclusive:
# This is INVALID
router:
  ingress:
    ingressClass: nginx
  gateway:  # ERROR: Cannot use both
    name: ai-gateway

Complete Networking Examples

Production with HTTPS

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b-prod
  namespace: production
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
      annotations:
        cloud.google.com/neg: '{"ingress": true}'
    router:
      hostDomainName: api.company.com
      ingress:
        ingressClass: nginx
        tlsSecretName: production-tls
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-prod
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/proxy-body-size: "100m"
        nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
        nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
Endpoint: https://llama-3-70b-prod.production.api.company.com

Multi-Protocol Service

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: triton-nim
  namespace: nim-service
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
      grpcPort: 8001
      metricsPort: 8002
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8002"
        prometheus.io/path: "/metrics"
    router:
      hostDomainName: ai.company.com
      gateway:
        namespace: gateway-system
        name: shared-gateway
        httpRoutesEnabled: true
        grpcRoutesEnabled: true

Gateway API with Load Balancing

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-multi-replica
  namespace: nim-service
spec:
  replicas: 3
  expose:
    service:
      type: ClusterIP
      port: 8000
    router:
      hostDomainName: ai.example.com
      gateway:
        namespace: gateway-system
        name: ai-gateway
        httpRoutesEnabled: true
      annotations:
        gateway.networking.k8s.io/load-balancer: round-robin

Best Practices

Networking Recommendations
  1. Use ClusterIP for internal services - Only expose via LoadBalancer when necessary
  2. Enable TLS in production - Use cert-manager for automatic certificate management
  3. Set appropriate timeouts - NIMs may have long response times for large generations
  4. Use Gateway API for advanced routing - HTTPRoute provides more flexibility than Ingress
  5. Monitor endpoint health - Configure readiness probes for intelligent routing

Timeout Configuration

LLM inference can take time. Configure appropriate timeouts:
router:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"

Rate Limiting

Protect your NIM service with rate limiting:
router:
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "100"
    nginx.ingress.kubernetes.io/limit-connections: "50"

CORS Configuration

Enable CORS for web applications:
router:
  annotations:
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, OPTIONS"
    nginx.ingress.kubernetes.io/cors-allow-origin: "https://app.company.com"

Troubleshooting

Service Not Accessible

Check service endpoints:
kubectl get svc <nimservice-name> -n <namespace>
kubectl get endpoints <nimservice-name> -n <namespace>

Ingress Not Working

Verify Ingress resource:
kubectl get ingress <nimservice-name> -n <namespace>
kubectl describe ingress <nimservice-name> -n <namespace>
Check Ingress controller logs:
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller

HTTPRoute Not Created

Verify Gateway exists:
kubectl get gateway -n <gateway-namespace>
kubectl get httproute -n <namespace>

TLS Certificate Issues

Check cert-manager status:
kubectl get certificate -n <namespace>
kubectl describe certificate <cert-name> -n <namespace>

Build docs developers (and LLMs) love