SFT Trainer

Overview

Supervised Fine-Tuning (SFT) is the simplest and most commonly used method to adapt a language model to a target dataset. The model is trained in a fully supervised fashion using pairs of input and output sequences. The goal is to minimize the negative log-likelihood (NLL) of the target sequence, conditioning on the input. SFTTrainer supports both language modeling and prompt-completion datasets, and works with standard or conversational dataset formats. When provided with a conversational dataset, the trainer automatically applies the model’s chat template.

Quick start

from trl import SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen3-0.6B",
    train_dataset=load_dataset("trl-lib/Capybara", split="train"),
)
trainer.train()

To launch distributed training:

accelerate launch train_sft.py

Dataset format

SFTTrainer accepts four dataset formats:

# Standard language modeling
{"text": "The sky is blue."}

# Conversational language modeling
{"messages": [{"role": "user", "content": "What color is the sky?"},
              {"role": "assistant", "content": "It is blue."}]}

# Standard prompt-completion
{"prompt": "The sky is",
 "completion": " blue."}

# Conversational prompt-completion
{"prompt": [{"role": "user", "content": "What color is the sky?"}],
 "completion": [{"role": "assistant", "content": "It is blue."}]}

For prompt-completion datasets, loss is computed only on the completion tokens by default. For language modeling datasets, loss is computed on the full sequence.

If your dataset uses different column names, preprocess it to match the expected format:

from datasets import load_dataset

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en")

def preprocess_function(example):
    return {
        "prompt": [{"role": "user", "content": example["Question"]}],
        "completion": [
            {"role": "assistant", "content": f"<think>{example['Complex_CoT']}</think>{example['Response']}"}
        ],
    }

dataset = dataset.map(preprocess_function, remove_columns=["Question", "Response", "Complex_CoT"])

Key configuration parameters

Data preprocessing

max_length

int | None

default:"1024"

Maximum length of the tokenized sequence. Sequences longer than this are truncated. Set to None to disable truncation (recommended for VLMs).

packing

bool

default:"false"

Whether to pack multiple short sequences into fixed-length blocks, improving GPU utilization and reducing padding waste. Uses max_length to define block size.

packing_strategy

str

default:"bfd"

Strategy for packing sequences: "bfd" (best-fit decreasing, truncates overflow), "bfd_split" (best-fit decreasing, splits overflow sequences), or "wrapped" (aggressive, cuts mid-sequence).

dataset_text_field

str

default:"text"

Name of the column containing text data for language modeling datasets.

truncation_mode

str

default:"keep_start"

Which end to truncate when a sequence exceeds max_length. Options: "keep_start" or "keep_end".

Loss computation

completion_only_loss

bool | None

default:"None"

Whether to compute loss only on the completion part. When None, defaults to True for prompt-completion datasets and False for language modeling datasets.

assistant_only_loss

bool

default:"false"

Whether to compute loss only on assistant responses in conversational datasets. Requires a chat template that supports the {% generation %} and {% endgeneration %} keywords.

loss_type

str

default:"nll"

Type of loss to use. Options: "nll" (standard negative log-likelihood) or "dft" (Dynamic Fine-Tuning, which rectifies the reward signal to improve generalization).

Model initialization

model_init_kwargs

dict | None

Keyword arguments forwarded to AutoModelForCausalLM.from_pretrained when the model argument is a string. Useful for setting dtype, device_map, or output_router_logits for MoE models.

chat_template_path

str | None

Path to a tokenizer or a Jinja template file to set as the model’s chat template. Useful when fine-tuning base models that do not have a chat template.

eos_token

str | None

Token used to indicate end of sequence. Required when the chat template uses a different EOS token than the tokenizer’s default.

Memory optimization

padding_free

bool

default:"false"

Perform forward passes without padding by flattening all sequences into a single continuous sequence. Requires FlashAttention 2 or 3. Automatically enabled when packing="bfd".

activation_offloading

bool

default:"false"

Offload activations to CPU to reduce GPU memory usage.

SFTConfig also overrides some TrainingArguments defaults: logging_steps=10, gradient_checkpointing=True, bf16=True, and learning_rate=2e-5.

Instruction tuning

To turn a base model into an instruction-following model, provide a chat template and a conversational dataset:

from trl import SFTConfig, SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen3-0.6B-Base",
    args=SFTConfig(
        output_dir="Qwen3-0.6B-Instruct",
        chat_template_path="HuggingFaceTB/SmolLM3-3B",
    ),
    train_dataset=load_dataset("trl-lib/Capybara", split="train"),
)
trainer.train()

Some base models (such as Qwen models) already have a chat template in the tokenizer. In that case, you do not need to set chat_template_path, but you should align the EOS token. For example, for Qwen/Qwen2.5-1.5B, set eos_token="<|im_end|>" in SFTConfig.

Dataset packing

Packing is a technique to increase training efficiency by grouping multiple short examples into a single fixed-length block, reducing wasted padding tokens.

from trl import SFTConfig, SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen3-0.6B",
    args=SFTConfig(
        packing=True,
        max_length=2048,
    ),
    train_dataset=load_dataset("trl-lib/Capybara", split="train"),
)
trainer.train()

For best performance with packing, use FlashAttention 2 or 3. The padding_free option is automatically enabled with packing_strategy="bfd", eliminating padding overhead entirely.

Training with PEFT/LoRA

Use the PEFT library to train only a small set of adapter parameters instead of the full model:

from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig

dataset = load_dataset("trl-lib/Capybara", split="train")

trainer = SFTTrainer(
    "Qwen/Qwen3-0.6B",
    train_dataset=dataset,
    peft_config=LoraConfig(),
)
trainer.train()

To continue training an existing PEFT model:

from datasets import load_dataset
from trl import SFTTrainer
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained("trl-lib/Qwen3-4B-LoRA", is_trainable=True)
dataset = load_dataset("trl-lib/Capybara", split="train")

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
)
trainer.train()

When training adapters, use a higher learning rate (around 1e-4) since only the new adapter parameters are being learned.

Training Vision-Language Models

SFTTrainer supports VLMs. Provide a dataset with an image column (single image) or images column (list of images):

from trl import SFTConfig, SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-VL-3B-Instruct",
    args=SFTConfig(max_length=None),
    train_dataset=load_dataset("trl-lib/llava-instruct-mix", split="train"),
)
trainer.train()

For VLMs, set max_length=None to prevent truncation from removing image tokens, which causes errors during training.

Logged metrics

Metric	Description
`loss`	Average cross-entropy loss over non-masked tokens
`entropy`	Average entropy of the model’s predicted token distribution
`mean_token_accuracy`	Proportion of tokens where top-1 prediction matches ground truth
`learning_rate`	Current learning rate
`grad_norm`	L2 norm of gradients before clipping
`num_tokens`	Total tokens processed so far

Get Started

Concepts

Trainers

How-to Guides

Integrations

Overview

Quick start

Dataset format

Key configuration parameters

Instruction tuning

Dataset packing

Training with PEFT/LoRA

Training Vision-Language Models

Logged metrics

Build docs developers (and LLMs) love

Get Started

Concepts

Trainers

How-to Guides

Integrations

​Overview

​Quick start

​Dataset format

​Key configuration parameters

​Instruction tuning

​Dataset packing

​Training with PEFT/LoRA

​Training Vision-Language Models

​Logged metrics

Build docs developers (and LLMs) love

Overview

Quick start

Dataset format

Key configuration parameters

Instruction tuning

Dataset packing

Training with PEFT/LoRA

Training Vision-Language Models

Logged metrics