Overview
The NVIDIA NIM Operator provides flexible networking options to expose your NIM services. Choose from ClusterIP for internal access, LoadBalancer for external traffic, or use Ingress/Gateway API for advanced routing.
Service Configuration
Service Types
Configure the Kubernetes Service type for your NIMService:
ClusterIP
LoadBalancer
NodePort
Internal cluster access only (default): expose :
service :
type : ClusterIP
port : 8000
External access via cloud load balancer: expose :
service :
type : LoadBalancer
port : 8000
annotations :
service.beta.kubernetes.io/aws-load-balancer-type : "nlb"
Access via node IP and static port: expose :
service :
type : NodePort
port : 8000
Service Ports
NIM services support multiple ports for different protocols:
expose :
service :
type : ClusterIP
port : 8000 # HTTP API port (default: 8000)
grpcPort : 8001 # gRPC port (optional)
metricsPort : 8002 # Metrics port (optional)
annotations :
prometheus.io/scrape : "true"
prometheus.io/port : "8002"
Main HTTP API serving port (1-65535)
gRPC serving port for Triton-based NIMs (1-65535)
expose.service.metricsPort
Separate metrics endpoint port for Triton Inference Server (1-65535)
Override the default service name
expose.service.annotations
Custom annotations for the Service resource
Named Ports
The operator creates named ports for service discovery:
api: HTTP API port (default: 8000)
grpc: gRPC port (if configured)
metrics: Metrics port (if configured)
Ingress Configuration
Router-Based Ingress (Recommended)
Use the router field for modern Ingress configuration:
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-3-8b
namespace : nim-service
spec :
expose :
service :
type : ClusterIP
port : 8000
router :
hostDomainName : example.com
annotations :
cert-manager.io/cluster-issuer : letsencrypt-prod
ingress :
ingressClass : nginx
tlsSecretName : llama-tls-cert
This creates an Ingress with hostname: llama-3-8b.nim-service.example.com
Base domain name. The full hostname is constructed as <service-name>.<namespace>.<hostDomainName> Pattern : ^(([a-z0-9][a-z0-9\-]*[a-z0-9])|[a-z0-9]+\.)*([a-z]+|xn\-\-[a-z0-9]+)\.?$
router.ingress.ingressClass
Ingress class to use (e.g., nginx, traefik, istio)
router.ingress.tlsSecretName
Name of the TLS secret for HTTPS
Annotations for the Ingress resource
Legacy Ingress Configuration
The expose.ingress field is deprecated. Use expose.router.ingress instead.
expose :
ingress :
enabled : true
spec :
ingressClassName : nginx
rules :
- host : llama.example.com
http :
paths :
- path : /v1/chat/completions
pathType : Prefix
backend :
service :
name : llama-3-8b
port :
number : 8000
Ingress Examples
NGINX Ingress
Traefik Ingress
Istio Ingress
expose :
router :
hostDomainName : ai.company.com
ingress :
ingressClass : nginx
tlsSecretName : nim-tls
annotations :
nginx.ingress.kubernetes.io/ssl-redirect : "true"
nginx.ingress.kubernetes.io/proxy-body-size : 100m
cert-manager.io/cluster-issuer : letsencrypt-prod
Gateway API Configuration
HTTP Routes
Use Kubernetes Gateway API for advanced routing:
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-3-70b
namespace : nim-service
spec :
expose :
service :
type : ClusterIP
port : 8000
router :
hostDomainName : ai.company.com
gateway :
namespace : gateway-system
name : ai-gateway
httpRoutesEnabled : true
grpcRoutesEnabled : false
annotations :
custom.annotation/team : ml-platform
This creates an HTTPRoute with:
Hostname: llama-3-70b.nim-service.ai.company.com
Parent Gateway: ai-gateway in gateway-system namespace
Path match: / (prefix)
Namespace of the Gateway resource
Name of the Gateway resource
router.gateway.httpRoutesEnabled
Enable HTTPRoute creation
router.gateway.grpcRoutesEnabled
Enable GRPCRoute creation (NIMPipeline only)
router.gateway.backendRef
Custom backend reference to override default service backend
gRPC Routes
For Triton-based NIMs with gRPC support:
expose :
service :
type : ClusterIP
port : 8000
grpcPort : 8001
router :
hostDomainName : ai.company.com
gateway :
namespace : gateway-system
name : ai-gateway
httpRoutesEnabled : true
grpcRoutesEnabled : true
gRPC routes require grpcPort to be configured in the service.
Custom Backend Reference
Route to a different backend (e.g., for canary deployments):
router :
gateway :
namespace : gateway-system
name : ai-gateway
backendRef :
group : ""
kind : Service
name : llama-canary
namespace : nim-service
port : 8000
weight : 100
Endpoint Picker Plugin (EPP) Configuration
EPP is only supported for standalone inference platform on NIMService.
Configure endpoint picker for intelligent routing:
expose :
router :
hostDomainName : ai.company.com
gateway :
namespace : gateway-system
name : ai-gateway
eppConfig :
containerSpec :
name : epp
image :
repository : nvcr.io/nvidia/cloud-native/endpoint-picker
tag : "0.1.0"
pullPolicy : IfNotPresent
env :
- name : LOG_LEVEL
value : debug
ports :
- name : http
containerPort : 8080
protocol : TCP
readinessProbe :
httpGet :
path : /healthz
port : 8080
initialDelaySeconds : 5
periodSeconds : 10
config :
apiVersion : config.apix.x-k8s.io/v1alpha1
kind : EndpointPickerConfig
routingStrategy : leastLoaded
backends :
- weight : 100
Container specification for EPP sidecar
Inline EndpointPickerConfig. Mutually exclusive with configMapRef.
Reference to ConfigMap containing EndpointPickerConfig. Mutually exclusive with config.
Router Validation
Ingress and Gateway configurations are mutually exclusive: # This is INVALID
router :
ingress :
ingressClass : nginx
gateway : # ERROR: Cannot use both
name : ai-gateway
Complete Networking Examples
Production with HTTPS
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-3-70b-prod
namespace : production
spec :
expose :
service :
type : ClusterIP
port : 8000
annotations :
cloud.google.com/neg : '{"ingress": true}'
router :
hostDomainName : api.company.com
ingress :
ingressClass : nginx
tlsSecretName : production-tls
annotations :
cert-manager.io/cluster-issuer : letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect : "true"
nginx.ingress.kubernetes.io/force-ssl-redirect : "true"
nginx.ingress.kubernetes.io/proxy-body-size : "100m"
nginx.ingress.kubernetes.io/proxy-read-timeout : "300"
nginx.ingress.kubernetes.io/proxy-send-timeout : "300"
Endpoint: https://llama-3-70b-prod.production.api.company.com
Multi-Protocol Service
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : triton-nim
namespace : nim-service
spec :
expose :
service :
type : ClusterIP
port : 8000
grpcPort : 8001
metricsPort : 8002
annotations :
prometheus.io/scrape : "true"
prometheus.io/port : "8002"
prometheus.io/path : "/metrics"
router :
hostDomainName : ai.company.com
gateway :
namespace : gateway-system
name : shared-gateway
httpRoutesEnabled : true
grpcRoutesEnabled : true
Gateway API with Load Balancing
apiVersion : apps.nvidia.com/v1alpha1
kind : NIMService
metadata :
name : llama-multi-replica
namespace : nim-service
spec :
replicas : 3
expose :
service :
type : ClusterIP
port : 8000
router :
hostDomainName : ai.example.com
gateway :
namespace : gateway-system
name : ai-gateway
httpRoutesEnabled : true
annotations :
gateway.networking.k8s.io/load-balancer : round-robin
Best Practices
Networking Recommendations
Use ClusterIP for internal services - Only expose via LoadBalancer when necessary
Enable TLS in production - Use cert-manager for automatic certificate management
Set appropriate timeouts - NIMs may have long response times for large generations
Use Gateway API for advanced routing - HTTPRoute provides more flexibility than Ingress
Monitor endpoint health - Configure readiness probes for intelligent routing
Timeout Configuration
LLM inference can take time. Configure appropriate timeouts:
router :
annotations :
nginx.ingress.kubernetes.io/proxy-read-timeout : "600"
nginx.ingress.kubernetes.io/proxy-send-timeout : "600"
nginx.ingress.kubernetes.io/proxy-connect-timeout : "60"
Rate Limiting
Protect your NIM service with rate limiting:
router :
annotations :
nginx.ingress.kubernetes.io/limit-rps : "100"
nginx.ingress.kubernetes.io/limit-connections : "50"
CORS Configuration
Enable CORS for web applications:
router :
annotations :
nginx.ingress.kubernetes.io/enable-cors : "true"
nginx.ingress.kubernetes.io/cors-allow-methods : "GET, POST, OPTIONS"
nginx.ingress.kubernetes.io/cors-allow-origin : "https://app.company.com"
Troubleshooting
Service Not Accessible
Check service endpoints:
kubectl get svc < nimservice-nam e > -n < namespac e >
kubectl get endpoints < nimservice-nam e > -n < namespac e >
Ingress Not Working
Verify Ingress resource:
kubectl get ingress < nimservice-nam e > -n < namespac e >
kubectl describe ingress < nimservice-nam e > -n < namespac e >
Check Ingress controller logs:
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller
HTTPRoute Not Created
Verify Gateway exists:
kubectl get gateway -n < gateway-namespac e >
kubectl get httproute -n < namespac e >
TLS Certificate Issues
Check cert-manager status:
kubectl get certificate -n < namespac e >
kubectl describe certificate < cert-nam e > -n < namespac e >