NIMBuild resource

The NIMBuild resource allows you to build optimized TensorRT-LLM (TRT-LLM) engines from model weights that have been cached using NIMCache. Building custom engines can significantly improve inference performance by optimizing for your specific GPU hardware and deployment configuration.

Overview

NIMBuild creates a Kubernetes Job that:

References a NIMCache resource containing model weights
Builds optimized TensorRT-LLM engines for the specified profile
Stores the built engine alongside the original model weights
Makes the optimized engine available for NIMService deployment

NIMBuild requires that the NIMCache resource is in a Ready state with buildable profiles available.

Basic example

Here’s a basic NIMBuild configuration that builds an optimized engine from a cached model:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMBuild
metadata:
  name: llama-3-8b-engine
  namespace: default
spec:
  nimCache:
    name: llama-3-8b-cache
    profile: 8b-tp1-pp1-h100-fp16
  modelName: llama-3-8b-optimized
  image:
    repository: nvcr.io/nvidia/nim-llm
    tag: "1.2.0"
    pullSecrets:
      - ngc-secret
  resources:
    limits:
      nvidia.com/gpu: 1
      memory: 64Gi
    requests:
      nvidia.com/gpu: 1
      memory: 32Gi
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-H100-80GB-HBM3
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

When to use NIMBuild

Use NIMBuild when you need:

Maximum inference performance - Build engines optimized for your specific GPU hardware
Custom model configurations - Fine-tune tensor parallelism and other engine parameters
Reduced latency - Pre-built engines eliminate runtime compilation overhead
Production deployments - Consistent performance with optimized engines

Building TensorRT-LLM engines is a resource-intensive operation that can take 30 minutes to several hours depending on model size and GPU availability. Plan accordingly.

Configuration

NIMCache reference

The nimCache field references the NIMCache resource containing the source model weights:

nimCache.name

string

required

Name of the NIMCache resource containing the model weights

nimCache.profile

string

Specific profile to build from the NIMCache. If omitted and only one buildable profile exists, it will be used automatically. If multiple buildable profiles exist, you must specify which one to build.

Model name

modelName

string

Name for the built engine model. If not specified, defaults to the NIMBuild resource name. This name is used in the manifest and can be referenced by NIMService.

Image configuration

image

object

required

Container image used for building the TRT-LLM engine

Show properties

repository

string

required

Image repository (e.g., nvcr.io/nvidia/nim-llm)

tag

string

required

Image tag or version

pullSecrets

array

Image pull secrets for private registries

pullPolicy

string

default:"IfNotPresent"

Image pull policy (Always, IfNotPresent, Never)

Resource requirements

resources

object

Resource requests and limits for the build job

Show properties

limits

object

Maximum resources for the build container

nvidia.com/gpu: Number of GPUs (typically 1-8)
memory: Memory limit (recommend 32-128Gi depending on model size)
cpu: CPU limit

requests

object

Requested resources for the build container

Resource recommendations by model size:

Small models (under 7B): 1 GPU, 32Gi memory
Medium models (7B-70B): 1-2 GPUs, 64Gi memory
Large models (70B+): 2-8 GPUs, 128Gi memory

Scheduling

nodeSelector

object

Node selector labels to schedule the build job on specific nodes. Use this to target nodes with specific GPU types.

tolerations

array

Tolerations for the build job to run on tainted nodes

Additional configuration

env

array

Additional environment variables for the build container

labels

object

Additional labels to apply to the build job

annotations

object

Additional annotations to apply to the build job

Status monitoring

Monitor the NIMBuild status to track the build progress:

kubectl get nimbuild llama-3-8b-engine -o jsonpath='{.status.state}'

Status states

Pending - Waiting for NIMCache to be ready or for resources
Started - Build job has been created
InProgress - Engine build is in progress
Ready - Engine build completed successfully
Failed - Build failed (check pod logs for details)
NotReady - Build job not yet ready

Checking build progress

View detailed status:

kubectl describe nimbuild llama-3-8b-engine

Check build pod logs:

kubectl logs -l nimbuild=llama-3-8b-engine -f

Using built engines with NIMService

Once the NIMBuild is Ready, reference it in your NIMService:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b-service
spec:
  image:
    repository: nvcr.io/nvidia/nim-llm
    tag: "1.2.0"
  storage:
    nimCache:
      name: llama-3-8b-cache
      profile: llama-3-8b-optimized  # Use the built engine
  resources:
    limits:
      nvidia.com/gpu: 1

Complete example

Here’s a complete example showing NIMCache, NIMBuild, and NIMService working together:

NIMCache
NIMBuild
NIMService

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMCache
metadata:
  name: llama-3-8b-cache
spec:
  source:
    ngc:
      modelPuller: nvcr.io/nvidia/nim-llm:1.2.0
      pullSecret: ngc-secret
      model:
        profiles:
          - llama-3-8b-base
        precision: fp16
        tensorParallelism: 1
        pipelineParallelism: 1
        gpus:
          - product: NVIDIA-H100-80GB-HBM3
        buildable: true
  storage:
    pvc:
      create: true
      storageClass: local-path
      size: 100Gi

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMBuild
metadata:
  name: llama-3-8b-engine
spec:
  nimCache:
    name: llama-3-8b-cache
  modelName: llama-3-8b-h100-optimized
  image:
    repository: nvcr.io/nvidia/nim-llm
    tag: "1.2.0"
    pullSecrets:
      - ngc-secret
  resources:
    limits:
      nvidia.com/gpu: 1
      memory: 64Gi
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-H100-80GB-HBM3

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-8b-service
spec:
  image:
    repository: nvcr.io/nvidia/nim-llm
    tag: "1.2.0"
    pullSecrets:
      - ngc-secret
  storage:
    nimCache:
      name: llama-3-8b-cache
      profile: llama-3-8b-h100-optimized
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000

Troubleshooting

Build fails with “NIMCache not found”

Ensure the NIMCache resource exists and is in the same namespace:

kubectl get nimcache -n <namespace>

Build fails with “Multiple buildable profiles found”

Specify the profile field in the nimCache reference to select which profile to build.

Build pod stays in Pending

Check for resource constraints:

kubectl describe pod -l nimbuild=<name>

Common causes:

Insufficient GPU nodes
Resource requests too high
Node selector doesn’t match any nodes

Build fails during execution

Check the build pod logs:

kubectl logs -l nimbuild=<name>

Common issues:

Insufficient memory (increase memory limits)
Invalid model configuration
GPU incompatibility

Best practices

Use specific GPU selectors - Target specific GPU types with nodeSelector for consistent builds
Allocate sufficient resources - Building large models requires significant memory and GPU resources
Monitor build time - Track build duration to optimize resource allocation
Store built engines - Use persistent storage to avoid rebuilding engines
Test before production - Validate built engines with test workloads before deploying to production

NIMCache Resource - Cache model weights from various sources
NIMService Resource - Deploy NIM services using built engines
NIMBuild API Reference - Complete API specification

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

NIMBuild

NIMBuild resource

Overview

Basic example

When to use NIMBuild

Configuration

NIMCache reference

Model name

Image configuration

Resource requirements

Scheduling

Additional configuration

Status monitoring

Status states

Checking build progress

Using built engines with NIMService

Complete example

Troubleshooting

Build fails with “NIMCache not found”

Build fails with “Multiple buildable profiles found”

Build pod stays in Pending

Build fails during execution

Best practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​NIMBuild resource

​Overview

​Basic example

​When to use NIMBuild

​Configuration

​NIMCache reference

​Model name

​Image configuration

​Resource requirements

​Scheduling

​Additional configuration

​Status monitoring

​Status states

​Checking build progress

​Using built engines with NIMService

​Complete example

​Troubleshooting

​Build fails with “NIMCache not found”

​Build fails with “Multiple buildable profiles found”

​Build pod stays in Pending

​Build fails during execution

​Best practices

​Related resources

Build docs developers (and LLMs) love

NIMBuild resource

Overview

Basic example

When to use NIMBuild

Configuration

NIMCache reference

Model name

Image configuration

Resource requirements

Scheduling

Additional configuration

Status monitoring

Status states

Checking build progress

Using built engines with NIMService

Complete example

Troubleshooting

Build fails with “NIMCache not found”

Build fails with “Multiple buildable profiles found”

Build pod stays in Pending

Build fails during execution

Best practices

Related resources