TRL - Transformers Reinforcement Learning - TRL - Transformers Reinforcement Learning

TRL is a full-stack Python library for post-training transformer language models. Built on top of the Hugging Face ecosystem, it provides state-of-the-art algorithms for supervised fine-tuning, preference optimization, and reinforcement learning from human feedback (RLHF) — scaling from a single GPU to multi-node clusters.

Quick Start

Fine-tune your first model in minutes with SFT, DPO, or GRPO

Installation

Install TRL with pip and set up your environment

Trainers

Explore SFT, GRPO, DPO, Reward, RLOO and more

API Reference

Full API docs for trainers, configs, utilities, and callbacks

Training algorithms

TRL covers the full post-training pipeline — from initial fine-tuning to reward modeling and RL-based alignment.

SFT Trainer

Supervised fine-tuning with packing, chat templates, and LoRA support

GRPO Trainer

Group Relative Policy Optimization — the algorithm behind DeepSeek-R1

DPO Trainer

Direct Preference Optimization for human preference alignment

Reward Trainer

Train reward models for RLHF pipelines

Get started

Install TRL

pip install trl

Load a dataset

from datasets import load_dataset
dataset = load_dataset("trl-lib/Capybara", split="train")

Train your model

from trl import SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset,
)
trainer.train()

Scale up

Use the CLI, DeepSpeed, or vLLM integrations to scale to multi-GPU and multi-node setups.

Key features

Efficient scaling

Leverage Accelerate for DDP, DeepSpeed ZeRO, and FSDP across single GPU to multi-node clusters

Memory-efficient training

Full PEFT/LoRA/QLoRA integration and quantization support for training on consumer hardware

vLLM acceleration

Fast online generation with co-located vLLM for RL-based training methods

Command Line Interface

Fine-tune without writing code using the trl CLI

Get Started

Concepts

Trainers

How-to Guides

Integrations

TRL - Transformers Reinforcement Learning

Quick Start

Installation

Trainers

API Reference

Training algorithms

SFT Trainer

GRPO Trainer

DPO Trainer

Reward Trainer

Get started

Key features

Efficient scaling

Memory-efficient training

vLLM acceleration

Command Line Interface

Build docs developers (and LLMs) love

Get Started

Concepts

Trainers

How-to Guides

Integrations

Quick Start

Installation

Trainers

API Reference

​Training algorithms

SFT Trainer

GRPO Trainer

DPO Trainer

Reward Trainer

​Get started

​Key features

Efficient scaling

Memory-efficient training

vLLM acceleration

Command Line Interface

Build docs developers (and LLMs) love

Training algorithms

Get started

Key features