Overview
The NVIDIA NIM Operator provides comprehensive security features including RBAC, secret management for API keys, security contexts for non-root execution, and certificate management for secure communications.
Secret Management
NGC API Key Secret
NIM containers require an NGC API key to download models and authenticate with NVIDIA services.
Create NGC Secret
Create a Kubernetes secret containing your NGC API key:kubectl create secret generic ngc-api-secret \
--from-literal=NGC_API_KEY=<your-ngc-api-key> \
-n nim-service
Reference in NIMService
Reference the secret in your NIMService:apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: llama-3-8b
spec:
authSecret: ngc-api-secret
Name of the Kubernetes secret containing the NGC API key. The secret must contain a key named NGC_API_KEY.
Custom Secret Key Name
The operator expects the secret key to be NGC_API_KEY by default. The API key is automatically injected as an environment variable:
env:
- name: NGC_API_KEY
valueFrom:
secretKeyRef:
name: ngc-api-secret
key: NGC_API_KEY
Hugging Face Token
For models from Hugging Face, add the HF_TOKEN to your secret or create a separate secret:
# Add to existing secret
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=<your-hf-token> \
-n nim-service
Reference in NIMService:
spec:
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: HF_TOKEN
Image Pull Secrets
For private container registries:
Create Docker Registry Secret
kubectl create secret docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=<your-ngc-api-key> \
-n nim-service
Reference in Image Spec
spec:
image:
repository: nvcr.io/nim/meta/llama-3-8b
tag: "1.0.0"
pullSecrets:
- ngc-secret
Multi-Secret Configuration
Combine multiple secrets for comprehensive authentication:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: llama-3-70b
spec:
authSecret: ngc-api-secret
image:
pullSecrets:
- ngc-secret
- docker-registry-secret
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: HF_TOKEN
- name: CUSTOM_API_KEY
valueFrom:
secretKeyRef:
name: custom-secret
key: API_KEY
RBAC Configuration
Automatic ServiceAccount Creation
The operator automatically creates a ServiceAccount with required RBAC permissions:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: llama-3-8b
spec:
# ServiceAccount is created automatically with name: llama-3-8b
The operator creates:
- ServiceAccount:
<nimservice-name>
- Role:
<nimservice-name> (with SCC permissions on OpenShift)
- RoleBinding:
<nimservice-name>
OpenShift Security Context Constraints (SCC)
The operator automatically configures appropriate SCC based on your configuration:
Non-Root (Default)
HostPath
Proxy with Certificates
For standard deployments:# Automatically uses 'nonroot' SCC
spec:
userID: 1000
groupID: 2000
Created Role:rules:
- apiGroups: ["security.openshift.io"]
resources: ["securitycontextconstraints"]
resourceNames: ["nonroot"]
verbs: ["use"]
For HostPath storage:# Automatically uses 'hostmount-anyuid' SCC
spec:
storage:
hostPath: /mnt/nim-cache
Created Role:rules:
- apiGroups: ["security.openshift.io"]
resources: ["securitycontextconstraints"]
resourceNames: ["hostmount-anyuid"]
verbs: ["use"]
For proxy with custom certificates:# Automatically uses 'anyuid' SCC
spec:
proxy:
httpsProxy: https://proxy.company.com:8080
certConfigMap: proxy-ca-cert
Created Role:rules:
- apiGroups: ["security.openshift.io"]
resources: ["securitycontextconstraints"]
resourceNames: ["anyuid"]
verbs: ["use"]
Custom RBAC Policies
For additional permissions, create a custom Role and RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: nim-custom-permissions
namespace: nim-service
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: nim-custom-permissions
namespace: nim-service
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: nim-custom-permissions
subjects:
- kind: ServiceAccount
name: llama-3-8b
namespace: nim-service
Security Context
User and Group IDs
NIM containers run as non-root by default:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: llama-3-8b
spec:
userID: 1000 # Default: 1000
groupID: 2000 # Default: 2000
User ID to run the NIM container. Must be non-zero for security.
Group ID to run the NIM container.
This creates a PodSecurityContext:
securityContext:
runAsUser: 1000
runAsGroup: 2000
runAsNonRoot: true
fsGroup: 2000
Runtime Class
Specify a RuntimeClass for additional security or GPU support:
spec:
runtimeClassName: nvidia
Common runtime classes:
nvidia: NVIDIA Container Runtime
kata: Kata Containers for VM-level isolation
gvisor: gVisor for sandboxed execution
Certificate Management
TLS Certificates for Ingress
Use cert-manager for automatic certificate provisioning:
Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
Create ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
Configure NIMService
spec:
expose:
router:
hostDomainName: ai.company.com
ingress:
ingressClass: nginx
tlsSecretName: nim-tls-cert
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
Cert-manager will automatically:
- Create a Certificate resource
- Request TLS certificate from Let’s Encrypt
- Store certificate in secret
nim-tls-cert
- Renew certificate before expiration
Custom CA Certificates
For enterprise environments with custom CAs:
Create ConfigMap with CA Certificate
kubectl create configmap custom-ca-cert \
--from-file=ca.crt=/path/to/ca-certificate.crt \
-n nim-service
Configure Proxy with Certificate
spec:
proxy:
httpsProxy: https://proxy.company.com:8080
httpProxy: http://proxy.company.com:8080
noProxy: localhost,127.0.0.1,.svc,.cluster.local
certConfigMap: custom-ca-cert
The operator will:
- Mount the ConfigMap as a volume
- Add init container to update CA certificates
- Set appropriate environment variables
Manual TLS Secret
Create TLS secret manually:
kubectl create secret tls nim-tls-cert \
--cert=/path/to/tls.crt \
--key=/path/to/tls.key \
-n nim-service
Reference in router configuration:
router:
ingress:
tlsSecretName: nim-tls-cert
Proxy Configuration
Configure HTTP/HTTPS proxy for outbound connections:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: llama-3-8b
spec:
proxy:
httpProxy: http://proxy.company.com:8080
httpsProxy: https://proxy.company.com:8080
noProxy: localhost,127.0.0.1,.svc,.cluster.local,10.0.0.0/8
certConfigMap: proxy-ca-cert
Comma-separated list of hosts to bypass proxy
Name of ConfigMap containing custom CA certificate
This automatically sets environment variables:
HTTP_PROXY, http_proxy
HTTPS_PROXY, https_proxy
NO_PROXY, no_proxy
NIM_SDK_USE_NATIVE_TLS=1
Security Best Practices
Security Recommendations
- Never commit secrets to Git - Use external secret managers (Vault, AWS Secrets Manager)
- Rotate API keys regularly - Update secrets periodically
- Use RBAC least privilege - Only grant necessary permissions
- Enable network policies - Restrict pod-to-pod communication
- Run as non-root - Always specify userID/groupID
- Use TLS everywhere - Enable HTTPS for all ingress
- Scan images regularly - Use Trivy or similar tools
Secret Rotation
Rotate NGC API key without downtime:
# Update secret
kubectl create secret generic ngc-api-secret \
--from-literal=NGC_API_KEY=<new-key> \
--dry-run=client -o yaml | kubectl apply -f -
# Rolling restart
kubectl rollout restart deployment <nimservice-name> -n nim-service
Network Policies
Restrict network access to NIM pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: nim-network-policy
namespace: nim-service
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: llama-3-8b
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8000
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443 # HTTPS
- protocol: TCP
port: 53 # DNS
- protocol: UDP
port: 53
Pod Security Standards
Enforce Pod Security Standards:
apiVersion: v1
kind: Namespace
metadata:
name: nim-service
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
External Secrets Integration
AWS Secrets Manager
Use External Secrets Operator with AWS:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets
namespace: nim-service
spec:
provider:
aws:
service: SecretsManager
region: us-west-2
auth:
jwt:
serviceAccountRef:
name: external-secrets
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: ngc-api-secret
namespace: nim-service
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets
kind: SecretStore
target:
name: ngc-api-secret
creationPolicy: Owner
data:
- secretKey: NGC_API_KEY
remoteRef:
key: nim/ngc-api-key
HashiCorp Vault
Integrate with Vault:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: nim-service
spec:
provider:
vault:
server: https://vault.company.com
path: secret
version: v2
auth:
kubernetes:
mountPath: kubernetes
role: nim-service
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: ngc-api-secret
spec:
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: ngc-api-secret
data:
- secretKey: NGC_API_KEY
remoteRef:
key: nim/credentials
property: ngc_api_key
Troubleshooting
Secret Not Found
Error: secret "ngc-api-secret" not found
Solution: Create the secret in the same namespace:
kubectl create secret generic ngc-api-secret \
--from-literal=NGC_API_KEY=<your-key> \
-n <namespace>
RBAC Permission Denied
Error: pods is forbidden: User "system:serviceaccount:nim-service:default" cannot get resource "pods"
Solution: Ensure ServiceAccount has proper Role/RoleBinding or create custom RBAC.
TLS Certificate Issues
Check certificate status:
kubectl get certificate -n <namespace>
kubectl describe certificate <cert-name> -n <namespace>
kubectl logs -n cert-manager deploy/cert-manager
Proxy Connection Failures
Verify proxy configuration:
kubectl exec -it <pod-name> -n <namespace> -- env | grep -i proxy
kubectl exec -it <pod-name> -n <namespace> -- curl -v https://api.ngc.nvidia.com