Training Process
The training is divided into two distinct stages that share the same data synthesis process and training pipeline, but differ in their loss functions:Stage 1: Train Real-ESRNet
Train Real-ESRNet using L1 loss from a pre-trained ESRGAN model. This stage provides a stable foundation and prevents mode collapse.
- Uses L1 loss only
- Starts from pre-trained ESRGAN weights
- Results in a stable base model
Why Two Stages?
This two-stage approach offers several advantages:- Stability: Starting with L1 loss provides a stable baseline before introducing adversarial training
- Quality: The combination of losses in stage 2 improves perceptual quality while maintaining fidelity
- Convergence: Pre-training with L1 loss helps the GAN training converge more reliably
Training Requirements
Hardware
- Multiple GPUs recommended (examples use 4 GPUs)
- Single GPU training is supported but slower
- Adequate disk space for datasets
Datasets
Real-ESRGAN is trained on:- DF2K: Combination of DIV2K and Flickr2K datasets
- OST: OpenImages subset for training
The degradation process simulates real-world image degradation, including blur, noise, compression artifacts, and downsampling.
Training Modes
Debug Mode
Test your configuration before full training:Full Training
Run the complete training process:--auto_resume flag automatically resumes training from the last checkpoint if interrupted.
Next Steps
Dataset Preparation
Learn how to prepare and process training datasets
Train Real-ESRNet
Start with stage 1 training
Train Real-ESRGAN
Complete stage 2 for the final model
Fine-tuning
Adapt the model to your custom dataset