trtllm-prune command removes weights from TensorRT-LLM checkpoints, useful for reducing checkpoint size during testing or creating lightweight checkpoint skeletons.
Overview
Pruning creates a checkpoint with empty weight tensors while preserving the checkpoint structure and metadata. This is useful for:- Testing checkpoint loading without full model weights
- Reducing storage space for checkpoint distribution
- Creating lightweight checkpoint templates
Command Syntax
Required Arguments
Path to the input TensorRT-LLM checkpoint directory containing
config.json and rank*.safetensors filesOutput directory for the pruned checkpoint
Optional Arguments
Remove all weights in the checkpoint. If not set, only prunable weights (attention QKV, projection, MLP weights) are removed.
Prunable Weights
By default, the following weight types are pruned:attention.qkv.weight- Query, Key, Value projection weightsattention.proj.weight- Attention output projectionmlp.fc.weight- MLP fully connected layermlp.proj.weight- MLP projection layermlp.gate.weight- MLP gate weights (for gated activations)
--prune_all is specified.
Examples
Prune Specific Weights
Prune All Weights
Output Structure
The pruned checkpoint maintains the same structure as the input:config.json file is updated with "is_pruned": true to indicate the checkpoint has been pruned.
Use Cases
Testing Checkpoint Loading
Testing Checkpoint Loading
Create lightweight checkpoints to test model loading logic without requiring full model weights:
Checkpoint Distribution
Checkpoint Distribution
Distribute checkpoint structure without weights for testing or validation purposes.
Storage Optimization
Storage Optimization
Reduce storage requirements for intermediate checkpoints during development.
Related Commands
trtllm-build
Build TensorRT engines from checkpoints
trtllm-refit
Update engine weights from checkpoints