Overview
LoRA (Low-Rank Adaptation) enables efficient fine-tuning of large language models by training small adapter matrices instead of all model parameters. This dramatically reduces memory requirements and training time while maintaining performance.LoraArguments
Configure LoRA training with these parameters:Core Parameters
LoRA Rank
Rank of the LoRA update matrices. Controls the size of adapter weights:
- Lower (8-16): Fewer parameters, faster training, may underfit
- Medium (32-64): Balanced performance and efficiency (recommended)
- Higher (128+): More expressive, closer to full fine-tuning
LoRA Alpha
Scaling factor for LoRA updates. The effective learning rate multiplier is
lora_alpha / lora_r:- Typical values: 16, 32, 64
- Higher values increase the influence of LoRA updates
- Usually set to
lora_rorlora_r / 2
LoRA Dropout
Dropout probability for LoRA layers:
0.0: No dropout0.05-0.1: Light regularization (recommended)0.1-0.3: Stronger regularization
Target Modules
List of module names to apply LoRA to. For Qwen models:
c_attn: Attention query/key/value projectionsc_proj: Attention output projectionw1,w2: FFN layers
Common Configurations
Attention only (fastest, least parameters):Bias Training
Which bias parameters to train:
"none": No bias training (fastest)"all": Train all bias parameters"lora_only": Train only biases of LoRA modules
Quantized LoRA (QLoRA)
Enable QLoRA for 4-bit quantized fine-tuning:
- Reduces memory usage by ~75%
- Enables fine-tuning large models on consumer GPUs
- Slight performance trade-off
QLoRA Configuration
When using QLoRA, the model is automatically loaded with 4-bit quantization:Loading Pretrained LoRA
Path to pretrained LoRA weights to continue training: