Skip to main content

Overview

The NVIDIA NIM Operator provides comprehensive security features including RBAC, secret management for API keys, security contexts for non-root execution, and certificate management for secure communications.

Secret Management

NGC API Key Secret

NIM containers require an NGC API key to download models and authenticate with NVIDIA services.
1

Create NGC Secret

Create a Kubernetes secret containing your NGC API key:
kubectl create secret generic ngc-api-secret \
  --from-literal=NGC_API_KEY=<your-ngc-api-key> \
  -n nim-service
2

Reference in NIMService

Reference the secret in your NIMService:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  authSecret: ngc-api-secret
authSecret
string
required
Name of the Kubernetes secret containing the NGC API key. The secret must contain a key named NGC_API_KEY.

Custom Secret Key Name

The operator expects the secret key to be NGC_API_KEY by default. The API key is automatically injected as an environment variable:
env:
- name: NGC_API_KEY
  valueFrom:
    secretKeyRef:
      name: ngc-api-secret
      key: NGC_API_KEY

Hugging Face Token

For models from Hugging Face, add the HF_TOKEN to your secret or create a separate secret:
# Add to existing secret
kubectl create secret generic hf-token-secret \
  --from-literal=HF_TOKEN=<your-hf-token> \
  -n nim-service
Reference in NIMService:
spec:
  env:
  - name: HF_TOKEN
    valueFrom:
      secretKeyRef:
        name: hf-token-secret
        key: HF_TOKEN

Image Pull Secrets

For private container registries:
1

Create Docker Registry Secret

kubectl create secret docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password=<your-ngc-api-key> \
  -n nim-service
2

Reference in Image Spec

spec:
  image:
    repository: nvcr.io/nim/meta/llama-3-8b
    tag: "1.0.0"
    pullSecrets:
    - ngc-secret

Multi-Secret Configuration

Combine multiple secrets for comprehensive authentication:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-70b
spec:
  authSecret: ngc-api-secret
  image:
    pullSecrets:
    - ngc-secret
    - docker-registry-secret
  env:
  - name: HF_TOKEN
    valueFrom:
      secretKeyRef:
        name: hf-token-secret
        key: HF_TOKEN
  - name: CUSTOM_API_KEY
    valueFrom:
      secretKeyRef:
        name: custom-secret
        key: API_KEY

RBAC Configuration

Automatic ServiceAccount Creation

The operator automatically creates a ServiceAccount with required RBAC permissions:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  # ServiceAccount is created automatically with name: llama-3-8b
The operator creates:
  • ServiceAccount: <nimservice-name>
  • Role: <nimservice-name> (with SCC permissions on OpenShift)
  • RoleBinding: <nimservice-name>

OpenShift Security Context Constraints (SCC)

The operator automatically configures appropriate SCC based on your configuration:
For standard deployments:
# Automatically uses 'nonroot' SCC
spec:
  userID: 1000
  groupID: 2000
Created Role:
rules:
- apiGroups: ["security.openshift.io"]
  resources: ["securitycontextconstraints"]
  resourceNames: ["nonroot"]
  verbs: ["use"]

Custom RBAC Policies

For additional permissions, create a custom Role and RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: nim-custom-permissions
  namespace: nim-service
rules:
- apiGroups: [""]  
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nim-custom-permissions
  namespace: nim-service
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nim-custom-permissions
subjects:
- kind: ServiceAccount
  name: llama-3-8b
  namespace: nim-service

Security Context

User and Group IDs

NIM containers run as non-root by default:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  userID: 1000    # Default: 1000
  groupID: 2000   # Default: 2000
userID
integer
default:"1000"
User ID to run the NIM container. Must be non-zero for security.
groupID
integer
default:"2000"
Group ID to run the NIM container.
This creates a PodSecurityContext:
securityContext:
  runAsUser: 1000
  runAsGroup: 2000
  runAsNonRoot: true
  fsGroup: 2000

Runtime Class

Specify a RuntimeClass for additional security or GPU support:
spec:
  runtimeClassName: nvidia
Common runtime classes:
  • nvidia: NVIDIA Container Runtime
  • kata: Kata Containers for VM-level isolation
  • gvisor: gVisor for sandboxed execution

Certificate Management

TLS Certificates for Ingress

Use cert-manager for automatic certificate provisioning:
1

Install cert-manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
2

Create ClusterIssuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
3

Configure NIMService

spec:
  expose:
    router:
      hostDomainName: ai.company.com
      ingress:
        ingressClass: nginx
        tlsSecretName: nim-tls-cert
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-prod
Cert-manager will automatically:
  1. Create a Certificate resource
  2. Request TLS certificate from Let’s Encrypt
  3. Store certificate in secret nim-tls-cert
  4. Renew certificate before expiration

Custom CA Certificates

For enterprise environments with custom CAs:
1

Create ConfigMap with CA Certificate

kubectl create configmap custom-ca-cert \
  --from-file=ca.crt=/path/to/ca-certificate.crt \
  -n nim-service
2

Configure Proxy with Certificate

spec:
  proxy:
    httpsProxy: https://proxy.company.com:8080
    httpProxy: http://proxy.company.com:8080
    noProxy: localhost,127.0.0.1,.svc,.cluster.local
    certConfigMap: custom-ca-cert
The operator will:
  1. Mount the ConfigMap as a volume
  2. Add init container to update CA certificates
  3. Set appropriate environment variables

Manual TLS Secret

Create TLS secret manually:
kubectl create secret tls nim-tls-cert \
  --cert=/path/to/tls.crt \
  --key=/path/to/tls.key \
  -n nim-service
Reference in router configuration:
router:
  ingress:
    tlsSecretName: nim-tls-cert

Proxy Configuration

Configure HTTP/HTTPS proxy for outbound connections:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b
spec:
  proxy:
    httpProxy: http://proxy.company.com:8080
    httpsProxy: https://proxy.company.com:8080
    noProxy: localhost,127.0.0.1,.svc,.cluster.local,10.0.0.0/8
    certConfigMap: proxy-ca-cert
proxy.httpProxy
string
HTTP proxy URL
proxy.httpsProxy
string
HTTPS proxy URL
proxy.noProxy
string
Comma-separated list of hosts to bypass proxy
proxy.certConfigMap
string
Name of ConfigMap containing custom CA certificate
This automatically sets environment variables:
  • HTTP_PROXY, http_proxy
  • HTTPS_PROXY, https_proxy
  • NO_PROXY, no_proxy
  • NIM_SDK_USE_NATIVE_TLS=1

Security Best Practices

Security Recommendations
  1. Never commit secrets to Git - Use external secret managers (Vault, AWS Secrets Manager)
  2. Rotate API keys regularly - Update secrets periodically
  3. Use RBAC least privilege - Only grant necessary permissions
  4. Enable network policies - Restrict pod-to-pod communication
  5. Run as non-root - Always specify userID/groupID
  6. Use TLS everywhere - Enable HTTPS for all ingress
  7. Scan images regularly - Use Trivy or similar tools

Secret Rotation

Rotate NGC API key without downtime:
# Update secret
kubectl create secret generic ngc-api-secret \
  --from-literal=NGC_API_KEY=<new-key> \
  --dry-run=client -o yaml | kubectl apply -f -

# Rolling restart
kubectl rollout restart deployment <nimservice-name> -n nim-service

Network Policies

Restrict network access to NIM pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: nim-network-policy
  namespace: nim-service
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: llama-3-8b
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 443  # HTTPS
    - protocol: TCP
      port: 53   # DNS
    - protocol: UDP
      port: 53

Pod Security Standards

Enforce Pod Security Standards:
apiVersion: v1
kind: Namespace
metadata:
  name: nim-service
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

External Secrets Integration

AWS Secrets Manager

Use External Secrets Operator with AWS:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets
  namespace: nim-service
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-west-2
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: ngc-api-secret
  namespace: nim-service
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets
    kind: SecretStore
  target:
    name: ngc-api-secret
    creationPolicy: Owner
  data:
  - secretKey: NGC_API_KEY
    remoteRef:
      key: nim/ngc-api-key

HashiCorp Vault

Integrate with Vault:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: vault-backend
  namespace: nim-service
spec:
  provider:
    vault:
      server: https://vault.company.com
      path: secret
      version: v2
      auth:
        kubernetes:
          mountPath: kubernetes
          role: nim-service
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: ngc-api-secret
spec:
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: ngc-api-secret
  data:
  - secretKey: NGC_API_KEY
    remoteRef:
      key: nim/credentials
      property: ngc_api_key

Troubleshooting

Secret Not Found

Error: secret "ngc-api-secret" not found
Solution: Create the secret in the same namespace:
kubectl create secret generic ngc-api-secret \
  --from-literal=NGC_API_KEY=<your-key> \
  -n <namespace>

RBAC Permission Denied

Error: pods is forbidden: User "system:serviceaccount:nim-service:default" cannot get resource "pods"
Solution: Ensure ServiceAccount has proper Role/RoleBinding or create custom RBAC.

TLS Certificate Issues

Check certificate status:
kubectl get certificate -n <namespace>
kubectl describe certificate <cert-name> -n <namespace>
kubectl logs -n cert-manager deploy/cert-manager

Proxy Connection Failures

Verify proxy configuration:
kubectl exec -it <pod-name> -n <namespace> -- env | grep -i proxy
kubectl exec -it <pod-name> -n <namespace> -- curl -v https://api.ngc.nvidia.com

Build docs developers (and LLMs) love