Prerequisites
Complete Real-ESRNet Training
Finish Real-ESRNet training and have the trained model checkpoint.
Configure Training Options
Modify the training configuration fileoptions/train_realesrgan_x4plus.yml:
Pre-trained Model Path
The Real-ESRNet model from stage 1 is used as the generator initialization. Verify the pre-trained model path:If your Real-ESRNet model is saved elsewhere, update
pretrain_network_g to point to the correct path.Dataset Configuration
Update the dataset paths, similar to Real-ESRNet training:Validation Configuration (Optional)
If you want validation during training:Understanding the Loss Functions
Real-ESRGAN training combines three loss functions:L1 Loss
Pixel-wise reconstruction loss for fidelity
Perceptual Loss
Feature-based loss for perceptual quality
GAN Loss
Adversarial loss for realistic details
- Fidelity: Accurate reconstruction (L1 loss)
- Perceptual quality: Natural appearance (Perceptual loss)
- Realistic details: Sharp, believable textures (GAN loss)
Debug Mode
Test your configuration before starting full training:Start Training
Once debug mode succeeds, start the full training:Training Parameters
Number of GPUs for distributed training
Port for distributed training communication (choose a different port from stage 1 if running simultaneously)
Use
pytorch for PyTorch distributed trainingAutomatically resume from the last checkpoint
Training Output
Training artifacts are saved to:Key Files
- models/net_g_*.pth: Generator (Real-ESRGAN) checkpoints
- models/net_d_*.pth: Discriminator checkpoints
- models/net_g_400000.pth: Final Real-ESRGAN model (use this for inference)
- training_states/*.state: Training state for resuming
- visualization/: Sample super-resolution outputs
The final
net_g_400000.pth is your trained Real-ESRGAN model ready for inference. You don’t need the discriminator for inference.Monitoring Training
Monitor these key metrics during Real-ESRGAN training:Important Metrics
- Generator L1 loss: Should be relatively stable
- Perceptual loss: Measures feature similarity
- GAN loss (Generator): Adversarial loss from the generator perspective
- GAN loss (Discriminator): Should balance with generator loss
- D_real and D_fake: Discriminator outputs (should stay balanced)
Interpreting GAN metrics
Interpreting GAN metrics
Healthy GAN training shows:
- Generator and discriminator losses that oscillate but remain relatively stable
- D_real (discriminator output on real images) around 0.5-0.7
- D_fake (discriminator output on fake images) around 0.3-0.5
- One loss consistently dominating
- D_real or D_fake approaching 0 or 1
- Losses diverging or becoming unstable
Training Duration
Typical training time for Real-ESRGAN:- 400K iterations on 4x V100 GPUs: ~1.5-2 days
- Single GPU training will take proportionally longer
Real-ESRGAN typically requires fewer iterations than Real-ESRNet (400K vs 1M) because it starts from the pre-trained Real-ESRNet model.
Using the Trained Model
After training completes, use your model for inference:Troubleshooting
Mode collapse (GAN failure)
Mode collapse (GAN failure)
Signs of mode collapse:
- Generator produces similar outputs for different inputs
- Discriminator loss goes to 0
- Generated images lack diversity
- Restart from an earlier checkpoint
- Adjust GAN loss weight in the config
- Verify Real-ESRNet pre-training was successful
Out of memory errors
Out of memory errors
GAN training uses more memory due to the discriminator. Reduce batch size:
Discriminator too strong/weak
Discriminator too strong/weak
Adjust the balance between generator and discriminator:
Artifacts in output
Artifacts in output
If the model produces artifacts:
- Check earlier checkpoints (100K, 200K) - sometimes earlier iterations are better
- Verify the Real-ESRNet pre-training was sufficient
- Consider adjusting perceptual loss weight
Next Steps
Fine-tune on Custom Data
Adapt your trained model to specific image domains
Inference Guide
Learn how to use your trained model for super-resolution