Skip to main content
Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can increase multi-GPU training throughput by 20% and reduce memory usage by 60%, enabling up to a 4x increase in context length. Liger Kernel provides Hugging Face-compatible replacements for RMSNorm, RoPE, SwiGLU, CrossEntropy, and FusedLinearCrossEntropy. It works out of the box with FlashAttention, PyTorch FSDP, and Microsoft DeepSpeed. With the memory reduction from Liger Kernel, you can potentially disable cpu_offloading or gradient checkpointing to further boost performance.

Installation

pip install liger-kernel

Supported trainers

Liger Kernel is supported in the following TRL trainers:

SFT

Supervised Fine-Tuning

DPO

Direct Preference Optimization

GRPO

Group Relative Policy Optimization

KTO

Kahneman-Tversky Optimization

GKD

Generalized Knowledge Distillation

Usage

Set use_liger_kernel=True in your trainer config. No other changes are needed.
from trl import SFTConfig

training_args = SFTConfig(..., use_liger_kernel=True)

Performance benefits

MetricImprovement
Training throughput+20% on multi-GPU setups
GPU memory usage−60%
Achievable context lengthUp to 4x longer
The memory reduction comes from fused kernel implementations that avoid materializing large intermediate tensors. For example, FusedLinearCrossEntropy fuses the final linear projection with the cross-entropy loss, which removes the need to store the full vocabulary-sized logit tensor.
Because Liger Kernel reduces memory usage significantly, you may be able to turn off gradient checkpointing or CPU offloading after enabling it, which can recover additional training throughput.

Additional resources

Liger Kernel repository

Source code, benchmarks, and detailed documentation.

Build docs developers (and LLMs) love