Skip to main content

Overview

LoRA (Low-Rank Adaptation) enables efficient fine-tuning of large language models by training small adapter matrices instead of all model parameters. This dramatically reduces memory requirements and training time while maintaining performance.

LoraArguments

Configure LoRA training with these parameters:
from dataclasses import dataclass, field
from typing import List

@dataclass
class LoraArguments:
    lora_r: int = 64
    lora_alpha: int = 16
    lora_dropout: float = 0.05
    lora_target_modules: List[str] = field(
        default_factory=lambda: ["c_attn", "c_proj", "w1", "w2"]
    )
    lora_weight_path: str = ""
    lora_bias: str = "none"
    q_lora: bool = False

Core Parameters

LoRA Rank

lora_r
int
default:"64"
Rank of the LoRA update matrices. Controls the size of adapter weights:
  • Lower (8-16): Fewer parameters, faster training, may underfit
  • Medium (32-64): Balanced performance and efficiency (recommended)
  • Higher (128+): More expressive, closer to full fine-tuning
--lora_r 64

LoRA Alpha

lora_alpha
int
default:"16"
Scaling factor for LoRA updates. The effective learning rate multiplier is lora_alpha / lora_r:
  • Typical values: 16, 32, 64
  • Higher values increase the influence of LoRA updates
  • Usually set to lora_r or lora_r / 2
--lora_alpha 16

LoRA Dropout

lora_dropout
float
default:"0.05"
Dropout probability for LoRA layers:
  • 0.0: No dropout
  • 0.05-0.1: Light regularization (recommended)
  • 0.1-0.3: Stronger regularization
--lora_dropout 0.05

Target Modules

lora_target_modules
list[str]
default:"[\"c_attn\", \"c_proj\", \"w1\", \"w2\"]"
List of module names to apply LoRA to. For Qwen models:
  • c_attn: Attention query/key/value projections
  • c_proj: Attention output projection
  • w1, w2: FFN layers
--lora_target_modules c_attn c_proj w1 w2

Common Configurations

Attention only (fastest, least parameters):
--lora_target_modules c_attn
Attention + output (balanced):
--lora_target_modules c_attn c_proj
Full coverage (best performance):
--lora_target_modules c_attn c_proj w1 w2

Bias Training

lora_bias
str
default:"none"
Which bias parameters to train:
  • "none": No bias training (fastest)
  • "all": Train all bias parameters
  • "lora_only": Train only biases of LoRA modules
--lora_bias none

Quantized LoRA (QLoRA)

q_lora
bool
default:"False"
Enable QLoRA for 4-bit quantized fine-tuning:
  • Reduces memory usage by ~75%
  • Enables fine-tuning large models on consumer GPUs
  • Slight performance trade-off
--q_lora

QLoRA Configuration

When using QLoRA, the model is automatically loaded with 4-bit quantization:
from transformers import GPTQConfig

if lora_args.q_lora:
    quantization_config = GPTQConfig(
        bits=4,
        disable_exllama=True
    )

Loading Pretrained LoRA

lora_weight_path
str
default:""
Path to pretrained LoRA weights to continue training:
--lora_weight_path ./output/checkpoint-1000

Complete Examples

Standard LoRA Training

python finetune.py \
  --model_name_or_path Qwen/Qwen-7B \
  --data_path train.json \
  --output_dir ./output/lora \
  --use_lora \
  --lora_r 64 \
  --lora_alpha 16 \
  --lora_dropout 0.05 \
  --lora_target_modules c_attn c_proj w1 w2 \
  --num_train_epochs 3 \
  --per_device_train_batch_size 4 \
  --gradient_accumulation_steps 4 \
  --learning_rate 1e-4 \
  --bf16

QLoRA Training (Memory Efficient)

python finetune.py \
  --model_name_or_path Qwen/Qwen-14B \
  --data_path train.json \
  --output_dir ./output/qlora \
  --use_lora \
  --q_lora \
  --lora_r 64 \
  --lora_alpha 16 \
  --lora_target_modules c_attn c_proj w1 w2 \
  --num_train_epochs 3 \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 16 \
  --learning_rate 2e-4 \
  --gradient_checkpointing \
  --bf16

Minimal LoRA (Fastest)

python finetune.py \
  --model_name_or_path Qwen/Qwen-7B-Chat \
  --data_path train.json \
  --output_dir ./output/lora-minimal \
  --use_lora \
  --lora_r 8 \
  --lora_alpha 16 \
  --lora_target_modules c_attn \
  --num_train_epochs 5 \
  --per_device_train_batch_size 8 \
  --learning_rate 1e-4

LoRA Implementation

The LoRA configuration is applied using PEFT library:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

if training_args.use_lora:
    # Determine which modules to save
    if lora_args.q_lora or is_chat_model:
        modules_to_save = None
    else:
        modules_to_save = ["wte", "lm_head"]  # For base models with new tokens
    
    # Create LoRA config
    lora_config = LoraConfig(
        r=lora_args.lora_r,
        lora_alpha=lora_args.lora_alpha,
        target_modules=lora_args.lora_target_modules,
        lora_dropout=lora_args.lora_dropout,
        bias=lora_args.lora_bias,
        task_type="CAUSAL_LM",
        modules_to_save=modules_to_save
    )
    
    # Prepare model for QLoRA if needed
    if lora_args.q_lora:
        model = prepare_model_for_kbit_training(
            model,
            use_gradient_checkpointing=training_args.gradient_checkpointing
        )
    
    # Apply LoRA
    model = get_peft_model(model, lora_config)
    model.print_trainable_parameters()

Trainable Parameters

LoRA dramatically reduces trainable parameters:
Full Fine-tuning (Qwen-7B):
  - Trainable params: 7,721,000,000
  - Percentage: 100%

LoRA (r=64, 4 modules):
  - Trainable params: 83,886,080
  - Percentage: 1.09%

QLoRA (r=64, 4 modules, 4-bit):
  - Trainable params: 83,886,080
  - Percentage: 1.09%
  - Memory usage: ~25% of full fine-tuning

Hyperparameter Guidelines

Task-Based Recommendations

Instruction Following / Chat:
--lora_r 64 \
--lora_alpha 16 \
--lora_target_modules c_attn c_proj w1 w2
Domain Adaptation:
--lora_r 32 \
--lora_alpha 32 \
--lora_target_modules c_attn c_proj
Task-Specific (Classification, etc.):
--lora_r 16 \
--lora_alpha 32 \
--lora_target_modules c_attn

Memory Constraints

24GB GPU (e.g., RTX 3090):
--q_lora \
--lora_r 64 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--gradient_checkpointing
40GB GPU (e.g., A100):
--use_lora \
--lora_r 64 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4
80GB GPU (e.g., A100 80GB):
--use_lora \
--lora_r 128 \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 2

Merging LoRA Weights

After training, merge LoRA adapters into base model:
from peft import PeftModel
from transformers import AutoModelForCausalLM

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen-7B",
    device_map="auto",
    trust_remote_code=True
)

# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, "./output/lora/checkpoint-1000")
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./output/merged-model")

Build docs developers (and LLMs) love