When to Fine-tune
Consider fine-tuning when:- You have domain-specific images (faces, anime, medical images, etc.)
- The pre-trained model doesn’t perform well on your images
- You want to specialize the model for a particular type of degradation
- You have a small custom dataset (hundreds to thousands of images)
Fine-tuning typically requires much less training time (tens of thousands of iterations vs. millions) because you start from a well-trained model.
Two Fine-tuning Approaches
Real-ESRGAN supports two fine-tuning strategies:On-the-fly Degradation
Only high-resolution images required. Low-quality images generated during training.Best for: General super-resolution with synthetic degradations
Paired Data
Use your own paired high-resolution and low-resolution images.Best for: Specific degradation types or real-world degraded images
Method 1: On-the-fly Degradation
This method generates degraded images during training using Real-ESRGAN’s degradation model.Step 1: Prepare Dataset
Only high-resolution images are needed. Follow the standard dataset preparation steps:Step 2: Download Pre-trained Models
Download the Real-ESRGAN pre-trained models:Step 3: Configure Fine-tuning
Modifyoptions/finetune_realesrgan_x4plus.yml:
Key configuration options for fine-tuning
Key configuration options for fine-tuning
Step 4: Start Fine-tuning
Method 2: Paired Data
Use this method when you have paired low-quality and high-quality images.Step 1: Prepare Paired Dataset
Organize your data into two folders:Step 2: Download Pre-trained Models
Same as Method 1 - download both generator and discriminator:Step 3: Configure Fine-tuning
Modifyoptions/finetune_realesrgan_x4plus_pairdata.yml:
The key difference is using
RealESRGANPairedDataset instead of RealESRGANDataset. This dataset type reads pre-made LQ/HQ pairs instead of generating degradation on-the-fly.Step 4: Start Fine-tuning
Fine-tuning Tips
How many iterations?
How many iterations?
Fine-tuning typically needs fewer iterations:
- Small dataset (100-500 images): 10,000-30,000 iterations
- Medium dataset (500-2000 images): 30,000-100,000 iterations
- Large dataset (2000+ images): 100,000-200,000 iterations
Learning rate selection
Learning rate selection
Use lower learning rates for fine-tuning:
- From scratch: 2e-4
- Fine-tuning: 1e-4 or 5e-5
- Small dataset: 5e-5 or 1e-5
Preventing overfitting
Preventing overfitting
With small datasets, prevent overfitting by:
- Using data augmentation (already in RealESRGAN)
- Reducing training iterations
- Monitoring validation loss
- Using a lower learning rate
- Keeping more of the pre-trained model frozen (advanced)
Testing multiple checkpoints
Testing multiple checkpoints
Different checkpoints may perform better on your data:Compare visual quality to choose the best checkpoint.
Domain-specific degradations
Domain-specific degradations
Customize the degradation process for your domain:In the config file, adjust degradation parameters:Match these to the degradation in your target domain.
Monitoring Fine-tuning
Watch for these signs during fine-tuning:Good Signs
- Losses decrease initially then stabilize
- Validation metrics improve
- Visual quality improves on your test images
- Model generalizes to unseen images
Warning Signs
- Losses continue decreasing but validation worsens (overfitting)
- GAN becomes unstable (mode collapse)
- Results look worse than pre-trained model
- Artifacts appear in outputs
If you see warning signs, try:
- Resuming from an earlier checkpoint
- Reducing learning rate
- Adding more training data
- Stopping training earlier
Using Fine-tuned Models
After fine-tuning completes:Example: Fine-tuning for Anime
Real-ESRGAN includes a variant fine-tuned for anime images. Here’s how it was done:Customize Degradation
Adjust degradation to match anime characteristics:
- Sharper edges (less blur)
- Less noise (anime is typically clean)
- Different compression artifacts
Next Steps
Inference Guide
Use your fine-tuned model for super-resolution
Python API
Integrate your model into applications