trl.scripts module provides the building blocks for writing training scripts: a YAML-aware argument parser (TrlParser), common argument dataclasses (ScriptArguments, ModelConfig), a dataset mixture loader (get_dataset / DatasetMixtureConfig), and a logging initializer (init_zero_verbose).
TrlParser
TrlParser extends transformers.HfArgumentParser with support for YAML configuration files and environment variable injection. Pass --config path/to/config.yaml on the command line to load defaults from a file; command-line arguments always override config file values.
The env key in the YAML file can set environment variables before the rest of the config is applied.
Signature
Parameters
One or more dataclass types to parse arguments into. None of the dataclasses may have a field named
"config" (reserved for the config file path).Methods
parse_args_and_config
parse_args_and_config
Parses command-line arguments and an optional YAML config file.The config file (specified via
--config) is loaded with yaml.safe_load. Its env section (if present) sets environment variables. All other keys are used as argument defaults. Raises ValueError for unknown config keys when fail_with_unknown_args=True.set_defaults_with_config
set_defaults_with_config
Overrides argument defaults with values from keyword arguments (typically from a YAML config). Marks overridden arguments as no longer required.Returns a list of string tokens for keys not recognized by the parser.
Example
ScriptArguments
A dataclass holding dataset-related arguments common to all TRL training scripts. Designed to be used withTrlParser.
Signature
Fields
Path or name of the dataset to load via
datasets.load_dataset. Ignored when DatasetMixtureConfig.datasets is provided.Dataset configuration name, corresponding to the
name argument of datasets.load_dataset. Ignored when a mixture config is used.Dataset split to use for training.
Dataset split to use for evaluation.
When
True, loads the dataset in streaming mode.Debug flag for distributed training. Fixes DDP issues with LM bias/mask buffers.
Example
ModelConfig
A dataclass holding model loading and PEFT configuration, designed for use withTrlParser.
Signature
Key fields
Model loading
Model loading
HuggingFace Hub identifier or local path of the model checkpoint.
Branch name, tag, or commit hash to load.
Load dtype override. One of
"auto", "bfloat16", "float16", "float32".Allow execution of custom model code from the Hub. Only enable for repositories you trust.
Attention kernel to use (e.g.,
"flash_attention_2").PEFT / LoRA
PEFT / LoRA
Enable PEFT/LoRA fine-tuning.
LoRA rank.
LoRA scaling factor.
LoRA dropout probability.
Module names to apply LoRA to.
PEFT task type. Use
"SEQ_CLS" for reward modeling.Use Rank-Stabilized LoRA (scales adapter by
lora_alpha/√r instead of lora_alpha/r).Enable Weight-Decomposed Low-Rank Adaptation (DoRA).
Quantization
Quantization
Load in 8-bit precision (requires LoRA).
Load in 4-bit precision (requires LoRA).
4-bit quantization type:
"fp4" or "nf4".Enable nested quantization (double quantization).
Example
DatasetMixtureConfig
Configuration dataclass for loading and combining multiple datasets into a single training mixture. Each dataset in the mixture is described by aDatasetConfig entry.
Signature
Fields
List of individual dataset configurations. Each entry specifies a
path, optional name, data_dir, data_files, split, and columns.Load all datasets in streaming mode.
If provided, the combined dataset is split into
train and test subsets using this fraction as the test size.YAML usage
get_dataset
Loads and concatenates a mixture of datasets described by aDatasetMixtureConfig. Returns a DatasetDict with a "train" key (and optionally a "test" key when test_split_size is set).
Signature
Parameters
Configuration specifying datasets, streaming, and optional test split.
Returns
datasets.DatasetDict — Combined dataset. Always contains a "train" split; also contains a "test" split if mixture_config.test_split_size is not None.
Example
init_zero_verbose
Configures Python’slogging and warnings for minimal, clean output — suitable for the top of CLI training scripts. Uses RichHandler when the rich package is available, falling back to a standard StreamHandler.
Signature
ERROR and redirects warnings.showwarning to the logging system.