NIMBuild resource
The NIMBuild resource allows you to build optimized TensorRT-LLM (TRT-LLM) engines from model weights that have been cached using NIMCache. Building custom engines can significantly improve inference performance by optimizing for your specific GPU hardware and deployment configuration.Overview
NIMBuild creates a Kubernetes Job that:- References a NIMCache resource containing model weights
- Builds optimized TensorRT-LLM engines for the specified profile
- Stores the built engine alongside the original model weights
- Makes the optimized engine available for NIMService deployment
NIMBuild requires that the NIMCache resource is in a
Ready state with buildable profiles available.Basic example
Here’s a basic NIMBuild configuration that builds an optimized engine from a cached model:When to use NIMBuild
Use NIMBuild when you need:- Maximum inference performance - Build engines optimized for your specific GPU hardware
- Custom model configurations - Fine-tune tensor parallelism and other engine parameters
- Reduced latency - Pre-built engines eliminate runtime compilation overhead
- Production deployments - Consistent performance with optimized engines
Configuration
NIMCache reference
ThenimCache field references the NIMCache resource containing the source model weights:
Name of the NIMCache resource containing the model weights
Specific profile to build from the NIMCache. If omitted and only one buildable profile exists, it will be used automatically. If multiple buildable profiles exist, you must specify which one to build.
Model name
Name for the built engine model. If not specified, defaults to the NIMBuild resource name. This name is used in the manifest and can be referenced by NIMService.
Image configuration
Container image used for building the TRT-LLM engine
Resource requirements
Resource requests and limits for the build job
Scheduling
Node selector labels to schedule the build job on specific nodes. Use this to target nodes with specific GPU types.
Tolerations for the build job to run on tainted nodes
Additional configuration
Additional environment variables for the build container
Additional labels to apply to the build job
Additional annotations to apply to the build job
Status monitoring
Monitor the NIMBuild status to track the build progress:Status states
Pending- Waiting for NIMCache to be ready or for resourcesStarted- Build job has been createdInProgress- Engine build is in progressReady- Engine build completed successfullyFailed- Build failed (check pod logs for details)NotReady- Build job not yet ready
Checking build progress
View detailed status:Using built engines with NIMService
Once the NIMBuild isReady, reference it in your NIMService:
Complete example
Here’s a complete example showing NIMCache, NIMBuild, and NIMService working together:- NIMCache
- NIMBuild
- NIMService
Troubleshooting
Build fails with “NIMCache not found”
Ensure the NIMCache resource exists and is in the same namespace:Build fails with “Multiple buildable profiles found”
Specify theprofile field in the nimCache reference to select which profile to build.
Build pod stays in Pending
Check for resource constraints:- Insufficient GPU nodes
- Resource requests too high
- Node selector doesn’t match any nodes
Build fails during execution
Check the build pod logs:- Insufficient memory (increase memory limits)
- Invalid model configuration
- GPU incompatibility
Best practices
- Use specific GPU selectors - Target specific GPU types with
nodeSelectorfor consistent builds - Allocate sufficient resources - Building large models requires significant memory and GPU resources
- Monitor build time - Track build duration to optimize resource allocation
- Store built engines - Use persistent storage to avoid rebuilding engines
- Test before production - Validate built engines with test workloads before deploying to production
Related resources
- NIMCache Resource - Cache model weights from various sources
- NIMService Resource - Deploy NIM services using built engines
- NIMBuild API Reference - Complete API specification