Networking Configuration

Overview

The NVIDIA NIM Operator provides flexible networking options to expose your NIM services. Choose from ClusterIP for internal access, LoadBalancer for external traffic, or use Ingress/Gateway API for advanced routing.

Service Configuration

Service Types

Configure the Kubernetes Service type for your NIMService:

ClusterIP
LoadBalancer
NodePort

Internal cluster access only (default):

expose:
  service:
    type: ClusterIP
    port: 8000

External access via cloud load balancer:

expose:
  service:
    type: LoadBalancer
    port: 8000
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"

Access via node IP and static port:

expose:
  service:
    type: NodePort
    port: 8000

Service Ports

NIM services support multiple ports for different protocols:

expose:
  service:
    type: ClusterIP
    port: 8000          # HTTP API port (default: 8000)
    grpcPort: 8001      # gRPC port (optional)
    metricsPort: 8002   # Metrics port (optional)
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "8002"

expose.service.port

integer

default:"8000"

Main HTTP API serving port (1-65535)

expose.service.grpcPort

integer

gRPC serving port for Triton-based NIMs (1-65535)

expose.service.metricsPort

integer

Separate metrics endpoint port for Triton Inference Server (1-65535)

expose.service.name

string

Override the default service name

expose.service.annotations

object

Custom annotations for the Service resource

Named Ports

The operator creates named ports for service discovery:

api: HTTP API port (default: 8000)
grpc: gRPC port (if configured)
metrics: Metrics port (if configured)

Ingress Configuration

Router-Based Ingress (Recommended)

Use the router field for modern Ingress configuration:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
  namespace: nim-service
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
    router:
      hostDomainName: example.com
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-prod
      ingress:
        ingressClass: nginx
        tlsSecretName: llama-tls-cert

This creates an Ingress with hostname: llama-3-8b.nim-service.example.com

router.hostDomainName

string

required

Base domain name. The full hostname is constructed as <service-name>.<namespace>.<hostDomainName>Pattern: ^(([a-z0-9][a-z0-9\-]*[a-z0-9])|[a-z0-9]+\.)*([a-z]+|xn\-\-[a-z0-9]+)\.?$

router.ingress.ingressClass

string

required

Ingress class to use (e.g., nginx, traefik, istio)

router.ingress.tlsSecretName

string

Name of the TLS secret for HTTPS

router.annotations

object

Annotations for the Ingress resource

Legacy Ingress Configuration

The expose.ingress field is deprecated. Use expose.router.ingress instead.

expose:
  ingress:
    enabled: true
    spec:
      ingressClassName: nginx
      rules:
      - host: llama.example.com
        http:
          paths:
          - path: /v1/chat/completions
            pathType: Prefix
            backend:
              service:
                name: llama-3-8b
                port:
                  number: 8000

Ingress Examples

expose:
  router:
    hostDomainName: ai.company.com
    ingress:
      ingressClass: nginx
      tlsSecretName: nim-tls
    annotations:
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/proxy-body-size: 100m
      cert-manager.io/cluster-issuer: letsencrypt-prod

Gateway API Configuration

HTTP Routes

Use Kubernetes Gateway API for advanced routing:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b
  namespace: nim-service
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
    router:
      hostDomainName: ai.company.com
      gateway:
        namespace: gateway-system
        name: ai-gateway
        httpRoutesEnabled: true
        grpcRoutesEnabled: false
      annotations:
        custom.annotation/team: ml-platform

This creates an HTTPRoute with:

Hostname: llama-3-70b.nim-service.ai.company.com
Parent Gateway: ai-gateway in gateway-system namespace
Path match: / (prefix)

router.gateway.namespace

string

required

Namespace of the Gateway resource

router.gateway.name

string

required

Name of the Gateway resource

router.gateway.httpRoutesEnabled

boolean

default:"true"

Enable HTTPRoute creation

router.gateway.grpcRoutesEnabled

boolean

default:"false"

Enable GRPCRoute creation (NIMPipeline only)

router.gateway.backendRef

object

Custom backend reference to override default service backend

gRPC Routes

For Triton-based NIMs with gRPC support:

expose:
  service:
    type: ClusterIP
    port: 8000
    grpcPort: 8001
  router:
    hostDomainName: ai.company.com
    gateway:
      namespace: gateway-system
      name: ai-gateway
      httpRoutesEnabled: true
      grpcRoutesEnabled: true

gRPC routes require grpcPort to be configured in the service.

Custom Backend Reference

Route to a different backend (e.g., for canary deployments):

router:
  gateway:
    namespace: gateway-system
    name: ai-gateway
    backendRef:
      group: ""
      kind: Service
      name: llama-canary
      namespace: nim-service
      port: 8000
      weight: 100

Endpoint Picker Plugin (EPP) Configuration

EPP is only supported for standalone inference platform on NIMService.

Configure endpoint picker for intelligent routing:

expose:
  router:
    hostDomainName: ai.company.com
    gateway:
      namespace: gateway-system
      name: ai-gateway
    eppConfig:
      containerSpec:
        name: epp
        image:
          repository: nvcr.io/nvidia/cloud-native/endpoint-picker
          tag: "0.1.0"
          pullPolicy: IfNotPresent
        env:
        - name: LOG_LEVEL
          value: debug
      ports:
      - name: http
        containerPort: 8080
        protocol: TCP
      readinessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10
      config:
        apiVersion: config.apix.x-k8s.io/v1alpha1
        kind: EndpointPickerConfig
        routingStrategy: leastLoaded
        backends:
        - weight: 100

eppConfig.containerSpec

object

required

Container specification for EPP sidecar

eppConfig.config

object

Inline EndpointPickerConfig. Mutually exclusive with configMapRef.

eppConfig.configMapRef

object

Reference to ConfigMap containing EndpointPickerConfig. Mutually exclusive with config.

Router Validation

Ingress and Gateway configurations are mutually exclusive:

# This is INVALID
router:
  ingress:
    ingressClass: nginx
  gateway:  # ERROR: Cannot use both
    name: ai-gateway

Complete Networking Examples

Production with HTTPS

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b-prod
  namespace: production
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
      annotations:
        cloud.google.com/neg: '{"ingress": true}'
    router:
      hostDomainName: api.company.com
      ingress:
        ingressClass: nginx
        tlsSecretName: production-tls
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-prod
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/proxy-body-size: "100m"
        nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
        nginx.ingress.kubernetes.io/proxy-send-timeout: "300"

Endpoint: https://llama-3-70b-prod.production.api.company.com

Multi-Protocol Service

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: triton-nim
  namespace: nim-service
spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
      grpcPort: 8001
      metricsPort: 8002
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8002"
        prometheus.io/path: "/metrics"
    router:
      hostDomainName: ai.company.com
      gateway:
        namespace: gateway-system
        name: shared-gateway
        httpRoutesEnabled: true
        grpcRoutesEnabled: true

Gateway API with Load Balancing

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-multi-replica
  namespace: nim-service
spec:
  replicas: 3
  expose:
    service:
      type: ClusterIP
      port: 8000
    router:
      hostDomainName: ai.example.com
      gateway:
        namespace: gateway-system
        name: ai-gateway
        httpRoutesEnabled: true
      annotations:
        gateway.networking.k8s.io/load-balancer: round-robin

Best Practices

Networking Recommendations

Use ClusterIP for internal services - Only expose via LoadBalancer when necessary
Enable TLS in production - Use cert-manager for automatic certificate management
Set appropriate timeouts - NIMs may have long response times for large generations
Use Gateway API for advanced routing - HTTPRoute provides more flexibility than Ingress
Monitor endpoint health - Configure readiness probes for intelligent routing

Timeout Configuration

LLM inference can take time. Configure appropriate timeouts:

router:
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"

Rate Limiting

Protect your NIM service with rate limiting:

router:
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "100"
    nginx.ingress.kubernetes.io/limit-connections: "50"

CORS Configuration

Enable CORS for web applications:

router:
  annotations:
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, OPTIONS"
    nginx.ingress.kubernetes.io/cors-allow-origin: "https://app.company.com"

Troubleshooting

Service Not Accessible

Check service endpoints:

kubectl get svc <nimservice-name> -n <namespace>
kubectl get endpoints <nimservice-name> -n <namespace>

Ingress Not Working

Verify Ingress resource:

kubectl get ingress <nimservice-name> -n <namespace>
kubectl describe ingress <nimservice-name> -n <namespace>

Check Ingress controller logs:

kubectl logs -n ingress-nginx deploy/ingress-nginx-controller

HTTPRoute Not Created

Verify Gateway exists:

kubectl get gateway -n <gateway-namespace>
kubectl get httproute -n <namespace>

TLS Certificate Issues

Check cert-manager status:

kubectl get certificate -n <namespace>
kubectl describe certificate <cert-name> -n <namespace>

Security Configuration - TLS certificates and secrets management
Scaling Configuration - Configure autoscaling for load balancing
Gateway API Documentation - Official Gateway API docs

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

Overview

Service Configuration

Service Types

Service Ports

Named Ports

Ingress Configuration

Router-Based Ingress (Recommended)

Legacy Ingress Configuration

Ingress Examples

Gateway API Configuration

HTTP Routes

gRPC Routes

Custom Backend Reference

Endpoint Picker Plugin (EPP) Configuration

Router Validation

Complete Networking Examples

Production with HTTPS

Multi-Protocol Service

Gateway API with Load Balancing

Best Practices

Timeout Configuration

Rate Limiting

CORS Configuration

Troubleshooting

Service Not Accessible

Ingress Not Working

HTTPRoute Not Created

TLS Certificate Issues

Build docs developers (and LLMs) love

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​Overview

​Service Configuration

​Service Types

​Service Ports

​Named Ports

​Ingress Configuration

​Router-Based Ingress (Recommended)

​Legacy Ingress Configuration

​Ingress Examples

​Gateway API Configuration

​HTTP Routes

​gRPC Routes

​Custom Backend Reference

​Endpoint Picker Plugin (EPP) Configuration

​Router Validation

​Complete Networking Examples

​Production with HTTPS

​Multi-Protocol Service

​Gateway API with Load Balancing

​Best Practices

​Timeout Configuration

​Rate Limiting

​CORS Configuration

​Troubleshooting

​Service Not Accessible

​Ingress Not Working

​HTTPRoute Not Created

​TLS Certificate Issues

​Related Resources

Build docs developers (and LLMs) love

Overview

Service Configuration

Service Types

Service Ports

Named Ports

Ingress Configuration

Router-Based Ingress (Recommended)

Legacy Ingress Configuration

Ingress Examples

Gateway API Configuration

HTTP Routes

gRPC Routes

Custom Backend Reference

Endpoint Picker Plugin (EPP) Configuration

Router Validation

Complete Networking Examples

Production with HTTPS

Multi-Protocol Service

Gateway API with Load Balancing

Best Practices

Timeout Configuration

Rate Limiting

CORS Configuration

Troubleshooting

Service Not Accessible

Ingress Not Working

HTTPRoute Not Created

TLS Certificate Issues

Related Resources