Train Real-ESRGAN

Real-ESRGAN is the second stage of training that builds on the Real-ESRNet model. It uses a combination of L1 loss, perceptual loss, and GAN loss to achieve high perceptual quality.

Prerequisites

Complete Real-ESRNet Training

Finish Real-ESRNet training and have the trained model checkpoint.

Locate Trained Model

Find the trained model at:

experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/models/net_g_1000000.pth

Configure Training Options

Modify the training configuration file options/train_realesrgan_x4plus.yml:

Pre-trained Model Path

The Real-ESRNet model from stage 1 is used as the generator initialization. Verify the pre-trained model path:

path:
  pretrain_network_g: experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/models/net_g_1000000.pth

If your Real-ESRNet model is saved elsewhere, update pretrain_network_g to point to the correct path.

Dataset Configuration

Update the dataset paths, similar to Real-ESRNet training:

train:
  name: DF2K+OST
  type: RealESRGANDataset
  dataroot_gt: datasets/DF2K  # modify to the root path of your folder
  meta_info: realesrgan/meta_info/meta_info_DF2Kmultiscale+OST_sub.txt  # modify to your own generate meta info txt
  io_backend:
    type: disk

Validation Configuration (Optional)

If you want validation during training:

# Uncomment these for validation
val:
  name: validation
  type: PairedImageDataset
  dataroot_gt: path_to_gt
  dataroot_lq: path_to_lq
  io_backend:
    type: disk

# validation settings
val:
  val_freq: !!float 5e3
  save_img: true
  metrics:
    psnr:
      type: calculate_psnr
      crop_border: 4
      test_y_channel: false

Understanding the Loss Functions

Real-ESRGAN training combines three loss functions:

L1 Loss

Pixel-wise reconstruction loss for fidelity

Perceptual Loss

Feature-based loss for perceptual quality

GAN Loss

Adversarial loss for realistic details

This combination balances:

Fidelity: Accurate reconstruction (L1 loss)
Perceptual quality: Natural appearance (Perceptual loss)
Realistic details: Sharp, believable textures (GAN loss)

Debug Mode

Test your configuration before starting full training:

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrgan_x4plus.yml --launcher pytorch --debug

Debug mode verifies that the pre-trained Real-ESRNet model loads correctly and that the GAN discriminator initializes properly.

Start Training

Once debug mode succeeds, start the full training:

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrgan_x4plus.yml --launcher pytorch --auto_resume

Training Parameters

--nproc_per_node

int

Number of GPUs for distributed training

--master_port

int

Port for distributed training communication (choose a different port from stage 1 if running simultaneously)

--launcher

string

Use pytorch for PyTorch distributed training

--auto_resume

boolean

Automatically resume from the last checkpoint

Training Output

Training artifacts are saved to:

experiments/
└── train_RealESRGANx4plus_400k_B12G4_fromESRNet/
    ├── models/
    │   ├── net_g_100000.pth
    │   ├── net_g_200000.pth
    │   ├── net_g_400000.pth    # Final generator
    │   ├── net_d_100000.pth
    │   ├── net_d_200000.pth
    │   └── net_d_400000.pth    # Final discriminator
    ├── training_states/
    │   └── 400000.state
    └── visualization/

Key Files

models/net_g_*.pth: Generator (Real-ESRGAN) checkpoints
models/net_d_*.pth: Discriminator checkpoints
models/net_g_400000.pth: Final Real-ESRGAN model (use this for inference)
training_states/*.state: Training state for resuming
visualization/: Sample super-resolution outputs

The final net_g_400000.pth is your trained Real-ESRGAN model ready for inference. You don’t need the discriminator for inference.

Monitoring Training

Monitor these key metrics during Real-ESRGAN training:

# View tensorboard logs
tensorboard --logdir experiments/train_RealESRGANx4plus_400k_B12G4_fromESRNet/tensorboard

Important Metrics

Generator L1 loss: Should be relatively stable
Perceptual loss: Measures feature similarity
GAN loss (Generator): Adversarial loss from the generator perspective
GAN loss (Discriminator): Should balance with generator loss
D_real and D_fake: Discriminator outputs (should stay balanced)

Interpreting GAN metrics

Healthy GAN training shows:

Generator and discriminator losses that oscillate but remain relatively stable
D_real (discriminator output on real images) around 0.5-0.7
D_fake (discriminator output on fake images) around 0.3-0.5

Warning signs:

One loss consistently dominating
D_real or D_fake approaching 0 or 1
Losses diverging or becoming unstable

Training Duration

Typical training time for Real-ESRGAN:

400K iterations on 4x V100 GPUs: ~1.5-2 days
Single GPU training will take proportionally longer

Real-ESRGAN typically requires fewer iterations than Real-ESRNet (400K vs 1M) because it starts from the pre-trained Real-ESRNet model.

Using the Trained Model

After training completes, use your model for inference:

python inference_realesrgan.py -n RealESRGAN_x4plus -i inputs -o results --model_path experiments/train_RealESRGANx4plus_400k_B12G4_fromESRNet/models/net_g_400000.pth

Or copy it to the standard weights directory:

mkdir -p weights
cp experiments/train_RealESRGANx4plus_400k_B12G4_fromESRNet/models/net_g_400000.pth weights/my_realesrgan.pth

Troubleshooting

Mode collapse (GAN failure)

Signs of mode collapse:

Generator produces similar outputs for different inputs
Discriminator loss goes to 0
Generated images lack diversity

Solutions:

Restart from an earlier checkpoint
Adjust GAN loss weight in the config
Verify Real-ESRNet pre-training was successful

Out of memory errors

GAN training uses more memory due to the discriminator. Reduce batch size:

datasets:
  train:
    batch_size_per_gpu: 8  # Reduce from 12

Discriminator too strong/weak

Adjust the balance between generator and discriminator:

train:
  optim_g:
    lr: !!float 2e-4
  optim_d:
    lr: !!float 2e-4  # Adjust this relative to optim_g

Artifacts in output

If the model produces artifacts:

Check earlier checkpoints (100K, 200K) - sometimes earlier iterations are better
Verify the Real-ESRNet pre-training was sufficient
Consider adjusting perceptual loss weight

Get Started

Core Concepts

Usage Guides

Training

Models

Resources

Prerequisites

Configure Training Options

Pre-trained Model Path

Dataset Configuration

Validation Configuration (Optional)

Understanding the Loss Functions

L1 Loss

Perceptual Loss

GAN Loss

Debug Mode

Start Training

Training Parameters

Training Output

Key Files

Monitoring Training

Important Metrics

Training Duration

Using the Trained Model

Troubleshooting

Next Steps

Fine-tune on Custom Data

Inference Guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Training

Models

Resources

​Prerequisites

​Configure Training Options

​Pre-trained Model Path

​Dataset Configuration

​Validation Configuration (Optional)

​Understanding the Loss Functions

L1 Loss

Perceptual Loss

GAN Loss

​Debug Mode

​Start Training

​Training Parameters

​Training Output

​Key Files

​Monitoring Training

​Important Metrics

​Training Duration

​Using the Trained Model

​Troubleshooting

​Next Steps

Fine-tune on Custom Data

Inference Guide

Build docs developers (and LLMs) love

Prerequisites

Configure Training Options

Pre-trained Model Path

Dataset Configuration

Validation Configuration (Optional)

Understanding the Loss Functions

Debug Mode

Start Training

Training Parameters

Training Output

Key Files

Monitoring Training

Important Metrics

Training Duration

Using the Trained Model

Troubleshooting

Next Steps