Skip to main content
TRL is a full-stack Python library for post-training transformer language models. Built on top of the Hugging Face ecosystem, it provides state-of-the-art algorithms for supervised fine-tuning, preference optimization, and reinforcement learning from human feedback (RLHF) — scaling from a single GPU to multi-node clusters.

Quick Start

Fine-tune your first model in minutes with SFT, DPO, or GRPO

Installation

Install TRL with pip and set up your environment

Trainers

Explore SFT, GRPO, DPO, Reward, RLOO and more

API Reference

Full API docs for trainers, configs, utilities, and callbacks

Training algorithms

TRL covers the full post-training pipeline — from initial fine-tuning to reward modeling and RL-based alignment.

SFT Trainer

Supervised fine-tuning with packing, chat templates, and LoRA support

GRPO Trainer

Group Relative Policy Optimization — the algorithm behind DeepSeek-R1

DPO Trainer

Direct Preference Optimization for human preference alignment

Reward Trainer

Train reward models for RLHF pipelines

Get started

1

Install TRL

pip install trl
2

Load a dataset

from datasets import load_dataset
dataset = load_dataset("trl-lib/Capybara", split="train")
3

Train your model

from trl import SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset,
)
trainer.train()
4

Scale up

Use the CLI, DeepSpeed, or vLLM integrations to scale to multi-GPU and multi-node setups.

Key features

Efficient scaling

Leverage Accelerate for DDP, DeepSpeed ZeRO, and FSDP across single GPU to multi-node clusters

Memory-efficient training

Full PEFT/LoRA/QLoRA integration and quantization support for training on consumer hardware

vLLM acceleration

Fast online generation with co-located vLLM for RL-based training methods

Command Line Interface

Fine-tune without writing code using the trl CLI

Build docs developers (and LLMs) love