Skip to main content
Real-ESRNet is the first stage of Real-ESRGAN training. It uses L1 loss to create a stable base model before adversarial training.

Prerequisites

1

Prepare Dataset

Complete the dataset preparation steps and have your meta info file ready.
2

Download Pre-trained Model

Download the ESRGAN pre-trained model as the starting point.

Download Pre-trained ESRGAN Model

Real-ESRNet training starts from a pre-trained ESRGAN model:
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.1/ESRGAN_SRx4_DF2KOST_official-ff704c30.pth -P experiments/pretrained_models
This downloads the ESRGAN model trained on DF2K and OST datasets.

Configure Training Options

Modify the training configuration file options/train_realesrnet_x4plus.yml:

Dataset Configuration

Update the dataset paths to match your prepared data:
train:
  name: DF2K+OST
  type: RealESRGANDataset
  dataroot_gt: datasets/DF2K  # modify to the root path of your folder
  meta_info: realesrgan/meta_info/meta_info_DF2Kmultiscale+OST_sub.txt  # modify to your own generate meta info txt
  io_backend:
    type: disk
dataroot_gt
string
required
Root directory containing your ground-truth images
meta_info
string
required
Path to the meta info text file you generated in dataset preparation
type
string
required
Dataset type - use RealESRGANDataset for on-the-fly degradation

Validation Configuration (Optional)

If you want to run validation during training, uncomment and modify these sections:
# Uncomment these for validation
val:
  name: validation
  type: PairedImageDataset
  dataroot_gt: path_to_gt
  dataroot_lq: path_to_lq
  io_backend:
    type: disk
And configure validation settings:
# Uncomment these for validation
# validation settings
val:
  val_freq: !!float 5e3
  save_img: true

  metrics:
    psnr: # metric name, can be arbitrary
      type: calculate_psnr
      crop_border: 4
      test_y_channel: false
Validation is optional but helps monitor training progress. Set val_freq to control how often validation runs (e.g., 5e3 means every 5000 iterations).

Debug Mode

Before starting the full training, test your configuration in debug mode to catch any issues:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrnet_x4plus.yml --launcher pytorch --debug
Debug mode:
  • Runs a few training iterations to verify everything works
  • Checks data loading and model initialization
  • Validates file paths and configurations
  • Exits early without full training
Use this to catch configuration errors before committing to a long training run.

Start Training

Once debug mode runs successfully, start the full training:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrnet_x4plus.yml --launcher pytorch --auto_resume

Training Parameters

--nproc_per_node
int
Number of GPUs to use for distributed training (e.g., 4 for 4 GPUs)
--master_port
int
Port for distributed training communication (e.g., 4321)
--launcher
string
Distributed training backend - use pytorch for PyTorch distributed
--auto_resume
boolean
Automatically resume training from the last checkpoint if interrupted
The --auto_resume flag is essential for long training runs. It automatically resumes from the last checkpoint if training is interrupted.

Training Output

Training artifacts are saved to the experiments directory:
experiments/
└── train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/
    ├── models/
    │   ├── net_g_100000.pth
    │   ├── net_g_200000.pth
    │   └── net_g_1000000.pth    # Final model
    ├── training_states/
    │   └── 1000000.state
    └── visualization/

Key Files

  • models/net_g_*.pth: Generator model checkpoints saved at intervals
  • models/net_g_1000000.pth: The final Real-ESRNet model after 1M iterations
  • training_states/*.state: Training state for resuming (optimizer, scheduler, etc.)
  • visualization/: Sample outputs during training (if enabled)
The final model net_g_1000000.pth will be used as the initialization for Real-ESRGAN training in stage 2.

Monitoring Training

Training progress is logged to the console and tensorboard (if configured):
# View tensorboard logs
tensorboard --logdir experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/tensorboard
Monitor these metrics:
  • L1 loss: Should decrease steadily
  • Learning rate: Check the schedule is working
  • Validation metrics: PSNR/SSIM if validation is enabled

Training Duration

Typical training time for Real-ESRNet:
  • 1M iterations on 4x V100 GPUs: ~3-4 days
  • Single GPU training will take proportionally longer
The default configuration trains for 1,000,000 iterations. You can adjust this in the config file:
train:
  total_iter: 1000000
  warmup_iter: -1
For quick testing, reduce to 100,000 iterations, though results will be suboptimal.

Troubleshooting

Reduce batch size in the configuration:
datasets:
  train:
    batch_size_per_gpu: 12  # Reduce this (e.g., to 6 or 4)
Verify your paths:
  • Check dataroot_gt points to the correct directory
  • Ensure meta_info file exists and contains valid paths
  • Paths in meta_info should be relative to dataroot_gt
Adjust CUDA_VISIBLE_DEVICES to match your available GPUs:
# For 2 GPUs (0 and 1)
CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch --nproc_per_node=2 ...
  • Use cropped sub-images (Step 2 of dataset preparation)
  • Increase num_worker_per_gpu for faster data loading
  • Ensure data is on fast storage (SSD)
  • Check GPU utilization with nvidia-smi

Next Step

Train Real-ESRGAN

Continue to stage 2: Train Real-ESRGAN with perceptual and GAN losses

Build docs developers (and LLMs) love