trtllm-refit command updates the weights in a pre-built TensorRT engine with new weights from a checkpoint, without rebuilding the engine from scratch.
Overview
Refitting allows you to:- Update a TensorRT engine with fine-tuned weights
- Quickly iterate on model weights without engine rebuilding
- Apply LoRA or other parameter-efficient fine-tuning results
Command Syntax
Arguments
Path to the directory containing the original TensorRT engines (
rank*.engine files)Path to the checkpoint directory with new weights to refit into the engine
Output directory for the refitted engines
How Refitting Works
- Load Engine: Deserialize the existing TensorRT engine
- Load Checkpoint: Load new weights from the checkpoint directory
- Preprocess Weights: Apply any necessary weight transformations (fusion, quantization, etc.)
- Refit: Update refittable weights in the engine
- Save: Serialize the refitted engine to the output directory
Refittable Weights
Only certain weights can be refitted:- Attention weights (QKV, output projection)
- MLP weights (gate, up, down projections)
- Layer normalization parameters
Examples
Refit Engine with Fine-Tuned Weights
Update with LoRA Weights
Performance Considerations
Refitting is much faster than rebuilding: Refitting typically takes seconds to minutes, while building a new engine can take 10-30 minutes depending on model size.
Troubleshooting
Missing Refittable Weights
Missing Refittable Weights
Error:
Missing refittable weights from checkpointSolution: Ensure the checkpoint contains all weights present in the original engine. The checkpoint structure must match the engine.Weight Shape Mismatch
Weight Shape Mismatch
Error:
Failed to refit weight: shape mismatchSolution: The checkpoint weights must have the same shapes as the original engine. Structural changes require rebuilding with trtllm-build.Uninitialized Weight
Uninitialized Weight
Error:
Failed because weight is not initialized in modelSolution: Verify the checkpoint is complete and all weight files are present.Workflow Example
Complete workflow for fine-tuning and refitting:Related Commands
trtllm-build
Build TensorRT engines from checkpoints
trtllm-prune
Prune checkpoint weights