vLLM for fast generation in online methods
Online methods such as GRPO or Online DPO require the model to generate completions, which is often the slowest step. vLLM speeds up generation significantly through PagedAttention and other optimizations. Install vLLM:Optimized attention implementations
TRL supports optimized attention backends that speed up training while reducing memory usage.- Kernels from Hub (recommended)
- Manual build
Use pre-optimized attention kernels from the Hub without manual compilation:Other available kernels include
kernels-community/vllm-flash-attn3 and kernels-community/paged-attention.For more details, see the Kernels Hub Integration guide.Liger Kernel
Liger Kernel is a collection of Triton kernels designed for LLM training. It can increase multi-GPU throughput by 20% and reduce memory usage by 60%.- SFT
- DPO
- GRPO
- KTO
- GKD