Overview
Qwen fine-tuning uses Hugging Face Transformers’TrainingArguments with additional custom arguments for model loading, data processing, and LoRA training.
Argument Classes
Four argument classes configure different aspects of training:ModelArguments
Specify which model to fine-tune.Hugging Face model ID or local path to model checkpoint:
DataArguments
Configure training and evaluation data.Path to training data JSON file:
Path to evaluation data JSON file (optional):
Use lazy data loading to reduce memory usage:Enable for very large datasets that don’t fit in memory.
TrainingArguments
Extends standard Hugging FaceTrainingArguments with Qwen-specific options.
Core Training Parameters
Directory for saving model checkpoints and outputs:
Number of training epochs:
Batch size per GPU during training:
Batch size per GPU during evaluation:
Number of steps to accumulate gradients before updating:Effective batch size = batch_size × gradient_accumulation_steps × num_gpus
Learning Rate
Initial learning rate:
Learning rate schedule:
linear: Linear decaycosine: Cosine annealingconstant: No decay
Number of warmup steps:
Warmup ratio (alternative to warmup_steps):
Optimization
Optimizer to use:
adamw_torch: PyTorch AdamWadamw_hf: Hugging Face AdamWadafactor: Adafactor (memory efficient)
Weight decay coefficient:
Adam beta1 parameter
Adam beta2 parameter
Maximum gradient norm for clipping:
Model Configuration
Maximum sequence length (input + output):
Enable LoRA fine-tuning:
Directory for caching downloaded models:
Checkpointing
When to save checkpoints:
steps: Everysave_stepsepoch: Every epochno: No saving
Save checkpoint every N steps:
Maximum number of checkpoints to keep:
Evaluation
When to run evaluation:
steps: Everyeval_stepsepoch: Every epochno: No evaluation
Evaluate every N steps:
Logging
Log metrics every N steps:
TensorBoard log directory:
Reporting integrations:
tensorboardwandbnone
Performance
Use FP16 mixed precision:
Use BF16 mixed precision (recommended for modern GPUs):
Enable gradient checkpointing to reduce memory:
Path to DeepSpeed config file:
Distributed Training
Local rank for distributed training (set automatically)
Find unused parameters in DDP: