trtllm-refit

The trtllm-refit command updates the weights in a pre-built TensorRT engine with new weights from a checkpoint, without rebuilding the engine from scratch.

Overview

Refitting allows you to:

Update a TensorRT engine with fine-tuned weights
Quickly iterate on model weights without engine rebuilding
Apply LoRA or other parameter-efficient fine-tuning results

This is significantly faster than rebuilding the entire engine, as it only updates the weight values while preserving the engine structure.

Command Syntax

trtllm-refit \
  --engine_dir <engine_directory> \
  --checkpoint_dir <checkpoint_directory> \
  --refit_engine_dir <output_directory>

Arguments

--engine_dir

string

required

Path to the directory containing the original TensorRT engines (rank*.engine files)

--checkpoint_dir

string

required

Path to the checkpoint directory with new weights to refit into the engine

--refit_engine_dir

string

required

Output directory for the refitted engines

How Refitting Works

Load Engine: Deserialize the existing TensorRT engine
Load Checkpoint: Load new weights from the checkpoint directory
Preprocess Weights: Apply any necessary weight transformations (fusion, quantization, etc.)
Refit: Update refittable weights in the engine
Save: Serialize the refitted engine to the output directory

Refittable Weights

Only certain weights can be refitted:

Attention weights (QKV, output projection)
MLP weights (gate, up, down projections)
Layer normalization parameters

Structural weights (embeddings, position encodings) are typically not refittable and require engine rebuilding if changed.

Examples

Refit Engine with Fine-Tuned Weights

# Original engine
trtllm-build \
  --checkpoint_dir ./llama-7b-base \
  --output_dir ./engines/base

# Fine-tune model (creates new checkpoint)
# ... training process ...

# Refit engine with fine-tuned weights
trtllm-refit \
  --engine_dir ./engines/base \
  --checkpoint_dir ./llama-7b-finetuned \
  --refit_engine_dir ./engines/finetuned

Update with LoRA Weights

# Apply LoRA adapter weights to existing engine
trtllm-refit \
  --engine_dir ./engines/base-model \
  --checkpoint_dir ./lora-checkpoint \
  --refit_engine_dir ./engines/lora-adapted

Performance Considerations

Refitting is much faster than rebuilding: Refitting typically takes seconds to minutes, while building a new engine can take 10-30 minutes depending on model size.

Not all weights are refittable: If your checkpoint modifies non-refittable weights (e.g., vocabulary size, hidden dimensions), you must rebuild the engine with trtllm-build.

Troubleshooting

Missing Refittable Weights

Error: Missing refittable weights from checkpointSolution: Ensure the checkpoint contains all weights present in the original engine. The checkpoint structure must match the engine.

Weight Shape Mismatch

Error: Failed to refit weight: shape mismatchSolution: The checkpoint weights must have the same shapes as the original engine. Structural changes require rebuilding with trtllm-build.

Uninitialized Weight

Error: Failed because weight is not initialized in modelSolution: Verify the checkpoint is complete and all weight files are present.

Workflow Example

Complete workflow for fine-tuning and refitting:

# 1. Build initial engine
trtllm-build \
  --checkpoint_dir ./base-model \
  --output_dir ./engines/v1

# 2. Deploy and serve
trtllm-serve ./engines/v1

# 3. Fine-tune model (creates new checkpoint)
python finetune.py --output ./finetuned-checkpoint

# 4. Refit engine with fine-tuned weights  
trtllm-refit \
  --engine_dir ./engines/v1 \
  --checkpoint_dir ./finetuned-checkpoint \
  --refit_engine_dir ./engines/v2

# 5. Deploy updated engine
trtllm-serve ./engines/v2

trtllm-build

Build TensorRT engines from checkpoints

trtllm-prune

Prune checkpoint weights

Python API

CLI Tools

Configuration

Overview

Command Syntax

Arguments

How Refitting Works

Refittable Weights

Examples

Refit Engine with Fine-Tuned Weights

Update with LoRA Weights

Performance Considerations

Troubleshooting

Workflow Example

trtllm-build

trtllm-prune

Build docs developers (and LLMs) love

Python API

CLI Tools

Configuration

​Overview

​Command Syntax

​Arguments

​How Refitting Works

​Refittable Weights

​Examples

​Refit Engine with Fine-Tuned Weights

​Update with LoRA Weights

​Performance Considerations

​Troubleshooting

​Workflow Example

​Related Commands

trtllm-build

trtllm-prune

Build docs developers (and LLMs) love

Overview

Command Syntax

Arguments

How Refitting Works

Refittable Weights

Examples

Refit Engine with Fine-Tuned Weights

Update with LoRA Weights

Performance Considerations

Troubleshooting

Workflow Example

Related Commands