PEFT Integration

TRL supports PEFT (Parameter-Efficient Fine-Tuning) methods for memory-efficient model training. PEFT enables fine-tuning large language models by training only a small number of additional parameters while keeping the base model frozen, significantly reducing computational costs and memory requirements.

Installation

pip install trl[peft]

For QLoRA support (4-bit and 8-bit quantization), also install:

pip install bitsandbytes

Quick start

All TRL trainers support PEFT through the peft_config argument. The simplest way to enable PEFT is via the CLI with the --use_peft flag:

python trl/scripts/sft.py \
    --model_name_or_path Qwen/Qwen2-0.5B \
    --dataset_name trl-lib/Capybara \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --output_dir Qwen2-0.5B-SFT-LoRA

Alternatively, pass a PEFT config directly in Python:

from peft import LoraConfig
from trl import SFTConfig, SFTTrainer

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

training_args = SFTConfig(
    learning_rate=2.0e-4,  # ~10x the full fine-tuning rate for LoRA
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

Three ways to configure PEFT

1. CLI flags (simplest)

Use the --use_peft flag with TRL scripts. Best for quick experiments and standard LoRA configurations.

python trl/scripts/sft.py \
    --model_name_or_path Qwen/Qwen2-0.5B \
    --dataset_name trl-lib/Capybara \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --output_dir Qwen2-0.5B-SFT-LoRA

Available CLI flags:

--use_peft — enable PEFT
--lora_r — LoRA rank (default: 16)
--lora_alpha — LoRA alpha (default: 32)
--lora_dropout — LoRA dropout (default: 0.05)
--lora_target_modules — target modules (space-separated)
--lora_modules_to_save — additional modules to train
--use_rslora — enable Rank-Stabilized LoRA
--use_dora — enable Weight-Decomposed LoRA (DoRA)
--load_in_4bit — enable 4-bit quantization (QLoRA)
--load_in_8bit — enable 8-bit quantization

2. Pass peft_config to trainer (recommended)

Pass a PEFT configuration directly to the trainer for full control over PEFT methods including LoRA, Prompt Tuning, and others.

from peft import LoraConfig
from trl import SFTConfig, SFTTrainer

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)

3. Apply PEFT to the model directly (advanced)

Apply PEFT to the model before passing it to the trainer. Useful for custom architectures or complex setups.

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
from trl import SFTConfig, SFTTrainer

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, peft_config)

# No peft_config argument needed — the model is already wrapped
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

Using `ModelConfig` and `get_peft_config`

For script-based workflows, TRL provides ModelConfig and the helper functions get_peft_config and get_quantization_config to build PEFT and quantization configs directly from CLI arguments.

from transformers import AutoModelForCausalLM
from trl import ModelConfig, get_peft_config, get_quantization_config, get_kbit_device_map

model_args = ModelConfig(
    model_name_or_path="Qwen/Qwen2-0.5B",
    use_peft=True,
    lora_r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    lora_target_modules=["q_proj", "v_proj"],
    load_in_4bit=True,
)

quantization_config = get_quantization_config(model_args)
model = AutoModelForCausalLM.from_pretrained(
    model_args.model_name_or_path,
    quantization_config=quantization_config,
    device_map=get_kbit_device_map(),  # returns {"": local_process_index} on GPU
)

peft_config = get_peft_config(model_args)  # Returns a LoraConfig or None

get_peft_config reads the following ModelConfig fields:

`ModelConfig` field	Default	Description
`use_peft`	`False`	Enable PEFT. Returns `None` when `False`.
`lora_r`	`16`	LoRA rank.
`lora_alpha`	`32`	LoRA scaling factor.
`lora_dropout`	`0.05`	Dropout probability for LoRA layers.
`lora_target_modules`	`None`	Modules to apply LoRA to.
`lora_task_type`	`"CAUSAL_LM"`	Task type (`"SEQ_CLS"` for reward modeling).
`use_rslora`	`False`	Use Rank-Stabilized LoRA.
`use_dora`	`False`	Use Weight-Decomposed LoRA (DoRA).
`lora_modules_to_save`	`None`	Additional modules to fully train.

`get_kbit_device_map`

get_kbit_device_map() returns a device map appropriate for k-bit (4-bit or 8-bit) quantized models in multi-GPU environments. Returns {"": local_process_index} when a GPU is available, or None when running on CPU.

from trl import get_kbit_device_map

device_map = get_kbit_device_map()
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-0.5B",
    load_in_4bit=True,
    device_map=device_map,
)

Always pass device_map=get_kbit_device_map() when using 4-bit or 8-bit quantization in multi-GPU setups. This ensures each process loads its shard onto the correct GPU.

PEFT with different trainers

SFT
DPO
GRPO

python trl/scripts/sft.py \
    --model_name_or_path Qwen/Qwen2-0.5B \
    --dataset_name trl-lib/Capybara \
    --learning_rate 2.0e-4 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --output_dir Qwen2-0.5B-SFT-LoRA

from peft import LoraConfig
from trl import SFTConfig, SFTTrainer

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj"],
)

training_args = SFTConfig(learning_rate=2.0e-4)

trainer = SFTTrainer(
    model="Qwen/Qwen2-0.5B",
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

When using PEFT with DPO, you do not need to provide a separate ref_model. The trainer automatically uses the frozen base model as the reference.

python trl/scripts/dpo.py \
    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
    --dataset_name trl-lib/ultrafeedback_binarized \
    --learning_rate 5.0e-6 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --output_dir Qwen2-0.5B-DPO-LoRA

from peft import LoraConfig
from trl import DPOConfig, DPOTrainer

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

training_args = DPOConfig(learning_rate=5.0e-6)

trainer = DPOTrainer(
    model="Qwen/Qwen2-0.5B",
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

python trl/scripts/grpo.py \
    --model_name_or_path Qwen/Qwen2-0.5B \
    --dataset_name trl-lib/math-reasoning \
    --learning_rate 1.0e-5 \
    --per_device_train_batch_size 2 \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --output_dir Qwen2-0.5B-GRPO-LoRA

from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

training_args = GRPOConfig(learning_rate=1.0e-5)

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B",
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

Learning rate considerations

When using LoRA or other PEFT methods, use a higher learning rate (approximately 10x) compared to full fine-tuning. PEFT methods train only a small fraction of parameters, requiring a larger learning rate to achieve comparable updates.

Trainer	Full fine-tuning	With LoRA (~10x)
SFT	`2.0e-5`	`2.0e-4`
DPO	`5.0e-7`	`5.0e-6`
GRPO	`1.0e-6`	`1.0e-5`
Prompt Tuning	N/A	`1.0e-2` to `3.0e-2`

QLoRA: quantized low-rank adaptation

QLoRA combines 4-bit quantization with LoRA to enable fine-tuning of very large models on consumer hardware. This can reduce memory requirements by up to 4x compared to standard LoRA.

Install bitsandbytes

pip install bitsandbytes

Load the model in 4-bit

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
)

Attach a LoRA adapter and train

from peft import LoraConfig
from trl import SFTConfig, SFTTrainer

peft_config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

training_args = SFTConfig(learning_rate=2.0e-4)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

The equivalent via CLI:

python trl/scripts/sft.py \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --dataset_name trl-lib/Capybara \
    --load_in_4bit \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --output_dir Llama-2-7b-QLoRA

`BitsAndBytesConfig` parameters

Parameter	Description
`bnb_4bit_quant_type`	Quantization data type: `"nf4"` (recommended) or `"fp4"`.
`bnb_4bit_compute_dtype`	Compute dtype for 4-bit layers. Use `torch.bfloat16` for stability.
`bnb_4bit_use_double_quant`	Nested quantization saves ~0.4 bits per parameter.

8-bit quantization

For slightly higher precision at reduced memory savings:

from transformers import BitsAndBytesConfig, AutoModelForCausalLM

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
)

LoRA configuration reference

from peft import LoraConfig

peft_config = LoraConfig(
    r=16,                          # LoRA rank
    lora_alpha=32,                 # LoRA scaling factor (typically 2x rank)
    lora_dropout=0.05,             # Dropout probability
    bias="none",                   # Bias training strategy
    task_type="CAUSAL_LM",         # Task type
    target_modules=["q_proj", "v_proj"],  # Modules to apply LoRA to
    modules_to_save=None,          # Additional modules to fully train
)

Parameter	Description
`r`	LoRA rank. Typical values: 8, 16, 32, 64. Higher rank = more parameters.
`lora_alpha`	Scaling factor, typically 2x the rank. Controls the magnitude of LoRA updates.
`lora_dropout`	Dropout probability for LoRA layers. Typical range: 0.05–0.1.
`target_modules`	Which linear layers to apply LoRA to (see below).
`modules_to_save`	Additional modules to fully train, e.g., `["embed_tokens", "lm_head"]`.

Target module selection

# Minimal — most memory efficient
target_modules=["q_proj", "v_proj"]

# Attention projections only
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]

# All linear layers — best performance, higher memory
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Prompt tuning

Prompt tuning learns soft prompts (continuous embeddings) prepended to the input while keeping the entire model frozen. It is particularly parameter-efficient for large models.

from peft import PromptTuningConfig, PromptTuningInit, TaskType
from trl import SFTConfig, SFTTrainer

peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=8,
    prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
    tokenizer_name_or_path="Qwen/Qwen2-0.5B",
)

training_args = SFTConfig(
    learning_rate=2.0e-2,  # Prompt tuning uses a higher LR (1e-2 to 3e-2)
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

Feature	Prompt tuning	LoRA
Parameters trained	~0.001%	~0.1–1%
Memory usage	Minimal	Low
Training speed	Fastest	Fast
Model modification	None	Adapter layers
Best for	Large models, many tasks	General fine-tuning
Learning rate	Higher (1e-2 to 3e-2)	Standard (1e-4 to 3e-4)

Saving and loading PEFT models

# Save adapter weights only (~few MB rather than several GB)
trainer.save_model("path/to/adapters")

# Load for inference
from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")
model = PeftModel.from_pretrained(base_model, "path/to/adapters")

# Optionally merge adapters into the base model for faster inference
model = model.merge_and_unload()

Pushing to Hub

# Share adapters on the Hugging Face Hub
model.push_to_hub("username/model-name-lora")

# Load from Hub
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, "username/model-name-lora")

Multi-GPU training

PEFT works with TRL’s multi-GPU support through Accelerate:

acceleate config
acceleate launch trl/scripts/sft.py \
    --model_name_or_path Qwen/Qwen2-0.5B \
    --dataset_name trl-lib/Capybara \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16

For QLoRA across multiple GPUs, the quantized base model is automatically sharded:

acceleate launch trl/scripts/sft.py \
    --model_name_or_path meta-llama/Llama-2-70b-hf \
    --load_in_4bit \
    --use_peft \
    --lora_r 32

Resources

SFT with LoRA/QLoRA notebook

Complete working example with both LoRA and QLoRA.

PEFT documentation

Official PEFT library documentation.

LoRA paper

Original LoRA methodology and results.

QLoRA paper

Efficient finetuning of quantized language models.

Get Started

Concepts

Trainers

How-to Guides

Integrations

Installation

Quick start

Three ways to configure PEFT

Using `ModelConfig` and `get_peft_config`

`get_kbit_device_map`

PEFT with different trainers

Learning rate considerations

QLoRA: quantized low-rank adaptation

`BitsAndBytesConfig` parameters

8-bit quantization

LoRA configuration reference

Target module selection

Prompt tuning

Saving and loading PEFT models

Pushing to Hub

Multi-GPU training

Resources

SFT with LoRA/QLoRA notebook

PEFT documentation

LoRA paper

QLoRA paper

Build docs developers (and LLMs) love

Get Started

Concepts

Trainers

How-to Guides

Integrations

​Installation

​Quick start

​Three ways to configure PEFT

​Using ModelConfig and get_peft_config

​get_kbit_device_map

​PEFT with different trainers

​Learning rate considerations

​QLoRA: quantized low-rank adaptation

​BitsAndBytesConfig parameters

​8-bit quantization

​LoRA configuration reference

​Target module selection

​Prompt tuning

​Saving and loading PEFT models

​Pushing to Hub

​Multi-GPU training

​Resources

SFT with LoRA/QLoRA notebook

PEFT documentation

LoRA paper

QLoRA paper

Build docs developers (and LLMs) love

Installation

Quick start

Three ways to configure PEFT

Using `ModelConfig` and `get_peft_config`

`get_kbit_device_map`

PEFT with different trainers

Learning rate considerations

QLoRA: quantized low-rank adaptation

`BitsAndBytesConfig` parameters

8-bit quantization

LoRA configuration reference

Target module selection

Prompt tuning

Saving and loading PEFT models

Pushing to Hub

Multi-GPU training

Resources