Liger Kernel Integration

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can increase multi-GPU training throughput by 20% and reduce memory usage by 60%, enabling up to a 4x increase in context length. Liger Kernel provides Hugging Face-compatible replacements for RMSNorm, RoPE, SwiGLU, CrossEntropy, and FusedLinearCrossEntropy. It works out of the box with FlashAttention, PyTorch FSDP, and Microsoft DeepSpeed. With the memory reduction from Liger Kernel, you can potentially disable cpu_offloading or gradient checkpointing to further boost performance.

Installation

pip install liger-kernel

Supported trainers

Liger Kernel is supported in the following TRL trainers:

SFT

Supervised Fine-Tuning

DPO

Direct Preference Optimization

GRPO

Group Relative Policy Optimization

KTO

Kahneman-Tversky Optimization

GKD

Generalized Knowledge Distillation

Usage

Set use_liger_kernel=True in your trainer config. No other changes are needed.

SFT
DPO
GRPO
KTO
GKD

from trl import SFTConfig

training_args = SFTConfig(..., use_liger_kernel=True)

from trl import DPOConfig

training_args = DPOConfig(..., use_liger_kernel=True)

from trl import GRPOConfig

training_args = GRPOConfig(..., use_liger_kernel=True)

from trl import KTOConfig

training_args = KTOConfig(..., use_liger_kernel=True)

from trl.experimental.gkd import GKDConfig

training_args = GKDConfig(..., use_liger_kernel=True)

Performance benefits

Metric	Improvement
Training throughput	+20% on multi-GPU setups
GPU memory usage	−60%
Achievable context length	Up to 4x longer

The memory reduction comes from fused kernel implementations that avoid materializing large intermediate tensors. For example, FusedLinearCrossEntropy fuses the final linear projection with the cross-entropy loss, which removes the need to store the full vocabulary-sized logit tensor.

Because Liger Kernel reduces memory usage significantly, you may be able to turn off gradient checkpointing or CPU offloading after enabling it, which can recover additional training throughput.

Additional resources

Liger Kernel repository

Source code, benchmarks, and detailed documentation.

Get Started

Concepts

Trainers

How-to Guides

Integrations

Installation

Supported trainers

SFT

DPO

GRPO

KTO

GKD

Usage

Performance benefits

Additional resources

Liger Kernel repository

Build docs developers (and LLMs) love

Get Started

Concepts

Trainers

How-to Guides

Integrations

​Installation

​Supported trainers

SFT

DPO

GRPO

KTO

GKD

​Usage

​Performance benefits

​Additional resources

Liger Kernel repository

Build docs developers (and LLMs) love

Installation

Supported trainers

Usage

Performance benefits

Additional resources