Quick Start
Fine-tune your first model in minutes with SFT, DPO, or GRPO
Installation
Install TRL with pip and set up your environment
Trainers
Explore SFT, GRPO, DPO, Reward, RLOO and more
API Reference
Full API docs for trainers, configs, utilities, and callbacks
Training algorithms
TRL covers the full post-training pipeline — from initial fine-tuning to reward modeling and RL-based alignment.SFT Trainer
Supervised fine-tuning with packing, chat templates, and LoRA support
GRPO Trainer
Group Relative Policy Optimization — the algorithm behind DeepSeek-R1
DPO Trainer
Direct Preference Optimization for human preference alignment
Reward Trainer
Train reward models for RLHF pipelines
Get started
Key features
Efficient scaling
Leverage Accelerate for DDP, DeepSpeed ZeRO, and FSDP across single GPU to multi-node clusters
Memory-efficient training
Full PEFT/LoRA/QLoRA integration and quantization support for training on consumer hardware
vLLM acceleration
Fast online generation with co-located vLLM for RL-based training methods
Command Line Interface
Fine-tune without writing code using the
trl CLI