Overview
Fine-tuning adapts pre-trained language models to your specific use case, improving performance on domain-specific tasks while using fewer resources than training from scratch.Llama 3.2 Fine-tuning
Fine-tune 1B and 3B models with LoRA
Gemma 3 Fine-tuning
Fine-tune 270M to 27B models efficiently
Why Fine-Tune?
- Performance
- Cost
- Privacy
Task-Specific Accuracy
Fine-tuning improves performance on specialized tasks:| Task | Base Model | Fine-tuned | Improvement |
|---|---|---|---|
| Domain Q&A | 62% | 89% | +27% |
| Code generation | 45% | 78% | +33% |
| Instruction following | 71% | 94% | +23% |
| Custom format | 38% | 92% | +54% |
Parameter-Efficient Fine-Tuning
LoRA (Low-Rank Adaptation)
LoRA adds small trainable matrices to existing model layers, dramatically reducing:- Training time: 3-5x faster
- Memory usage: 3-4x less VRAM
- Storage: Adapters are 10-100MB vs full model GBs
- Cost: Train on free Google Colab
How LoRA Works
Technical Details
Technical Details
r(rank): Size of low-rank matrices (typically 8-64)alpha: Scaling factor (typically 16-32)target_modules: Which layers to adapt
Llama 3.2 Fine-Tuning
Quick Start
Complete Implementation
Model Selection
- Llama 3.2 1B
- Llama 3.2 3B
Llama 3.2 1B Instruct
- Parameters: 1.2B
- Context: 128K tokens
- VRAM (4-bit): ~2GB
- Training time: ~20 min (100 steps, Colab T4)
- Use case: Fast, lightweight tasks
Training Configuration
Configure the fine-tuning process
LoRA Configuration
Add LoRA adapters to model
Dataset Preparation
ShareGPT Format
ShareGPT Format
Custom Dataset
Custom Dataset
Google Colab Setup
Open Colab Notebook
Go to Google Colab
Gemma 3 Fine-Tuning
Model Sizes
270M
Ultra-lightweightVRAM: ~1GB
1B
Fast & efficientVRAM: ~2GB
4B
BalancedVRAM: ~6GB
12B
High qualityVRAM: ~16GB
27B
Best performanceVRAM: ~32GB
Implementation
Gemma-Specific Notes
Chat Template
Chat Template
Gemma uses a specific chat format:The template formats messages as:
Memory Requirements
Memory Requirements
Approximate VRAM needs with 4-bit quantization:
| Model | 4-bit | 8-bit | Full (16-bit) |
|---|---|---|---|
| 270M | 0.5GB | 0.8GB | 1.5GB |
| 1B | 1.5GB | 2.5GB | 4.5GB |
| 4B | 5GB | 9GB | 16GB |
| 12B | 14GB | 25GB | 48GB |
| 27B | 30GB | 55GB | 108GB |
Model Selection Guide
Model Selection Guide
Choose based on use case:
- 270M: Edge devices, real-time inference, simple tasks
- 1B: Chatbots, content moderation, classification
- 4B: Code generation, summarization, Q&A (Colab-friendly)
- 12B: Complex reasoning, multi-turn conversations
- 27B: Production apps requiring GPT-3.5 level quality
Advanced Topics
Evaluation During Training
Add Validation Set
Add Validation Set
Inference After Fine-Tuning
Merging LoRA with Base Model
Create Full Model
Create Full Model
Quantization Options
- 4-bit (Default)
- 8-bit
- Full Precision
- VRAM: 25% of full model
- Speed: 90% of full model
- Quality: 98-99% of full model
- Best for: Most use cases
Best Practices
Data Quality
Data Quality
High-quality data is more important than quantity:
- ✅ 1,000 high-quality examples > 10,000 mediocre examples
- ✅ Clean, consistent formatting
- ✅ Diverse range of topics/styles
- ✅ Representative of actual use case
- ❌ Avoid duplicate or near-duplicate examples
- ❌ Don’t include low-quality or incorrect data
Training Duration
Training Duration
Don’t overtrain:Typical steps by dataset size:
- 1K examples: 100-300 steps
- 10K examples: 300-1000 steps
- 100K examples: 1000-3000 steps
Hyperparameter Tuning
Hyperparameter Tuning
Start with defaults, then adjust:
| Parameter | Default | Increase if… | Decrease if… |
|---|---|---|---|
learning_rate | 2e-4 | Loss plateaus | Loss explodes |
r (rank) | 16 | Need better quality | Out of memory |
max_steps | 60 | Underfitting | Overfitting |
batch_size | 2 | Have more VRAM | Out of memory |
Testing & Validation
Testing & Validation
Evaluate thoroughly:
Troubleshooting
Out of Memory (OOM)
Out of Memory (OOM)
Solutions:
- Reduce batch size:
- Use 4-bit quantization:
- Reduce sequence length:
- Use smaller LoRA rank:
Poor Quality Results
Poor Quality Results
Possible causes:
- Too few training steps: Increase
max_steps - Low-quality data: Clean and curate dataset
- Wrong chat template: Use correct template for model
- Learning rate too high: Reduce to 1e-4 or 5e-5
- Overfitting: Add validation set, reduce steps
Slow Training
Slow Training
Speed up training:
- Use smaller model (1B instead of 3B)
- Reduce sequence length
- Use gradient checkpointing:
- Ensure CUDA is used:
Resources
Unsloth GitHub
Fast fine-tuning library
Gemma 3 Blog
Gemma 3 fine-tuning guide
Example Code
Complete fine-tuning scripts
Tutorial
Step-by-step fine-tuning tutorial
