How It Works

Real-ESRGAN extends the powerful ESRGAN architecture to develop practical algorithms for general image and video restoration. Unlike traditional super-resolution methods that assume clean, well-defined degradations, Real-ESRGAN is trained with pure synthetic data to handle real-world blind super-resolution. Real-world image degradation is complex and unpredictable. Images can suffer from:

Unknown blur kernels
Complex noise patterns
JPEG compression artifacts
Multiple combined degradations
Various downsampling operations

Traditional super-resolution models trained on simple bicubic downsampling fail catastrophically when faced with these real-world degradations. Real-ESRGAN addresses this by learning from a sophisticated degradation synthesis process.

Synthetic Data Degradation Pipeline

Real-ESRGAN’s key innovation is training exclusively on synthetic data that closely mimics real-world degradations. The training pipeline generates low-quality images through:

First-order Degradation

Blur: Apply various blur kernels (isotropic/anisotropic Gaussian, generalized Gaussian)
Downsampling: Use different algorithms (bilinear, bicubic, area)
Noise: Add Gaussian noise with varying levels
JPEG Compression: Apply compression with random quality factors

Second-order Degradation

The process repeats with different parameters to simulate multiple rounds of degradation, better representing real-world image processing chains.

Sinc Filters

Apply sinc filters to model common artifacts from image processing operations.

By generating degraded images on-the-fly during training, Real-ESRGAN learns to handle a wide spectrum of degradation types without requiring paired real-world training data.

High-Order Degradation Modeling

The training uses a high-order degradation model that chains multiple degradation operations:

HR Image → [Blur → Resize → Noise → JPEG] → [Blur → Resize → Noise → JPEG] → Sinc Filter → LR Image

Each operation uses randomly sampled parameters, creating enormous variety in training data. This approach enables the model to generalize to diverse real-world scenarios.

Pure Synthetic Training

Real-ESRGAN achieves practical blind super-resolution without any real-world paired training data. All low-quality images are synthetically generated from high-quality images during training.

This approach offers several advantages:

Scalability: Easy to generate unlimited training data
Flexibility: Can adjust degradation parameters for specific domains
No alignment issues: No need for perfectly aligned HR/LR pairs
Domain adaptation: Can retrain for specific image types (anime, faces, etc.)

Network Architecture Strategy

Real-ESRGAN uses two primary generator architectures:

RRDBNet (Large Models)

Based on ESRGAN’s Residual-in-Residual Dense Block architecture
Used for general purpose models (RealESRGAN_x4plus, RealESRNet_x4plus)
Offers high quality at the cost of model size and inference speed
Default configuration: 23 RRDB blocks with 64 base features

SRVGGNetCompact (Lightweight Models)

Compact VGG-style architecture for fast inference
Used for anime videos and general-purpose lightweight models
Significantly smaller and faster than RRDBNet
Performs upsampling only in the final layer

Model Selection Guidelines

Choose RRDBNet when:

Quality is the top priority
Computational resources are available
Processing photos or complex natural images

Choose SRVGGNetCompact when:

Speed is critical (real-time or video processing)
Running on limited hardware
Processing anime or cartoon content
Model size needs to be minimal

Discriminator Architecture

Real-ESRGAN employs a U-Net discriminator with spectral normalization that:

Provides multi-scale discrimination through its U-Net structure
Uses skip connections to preserve fine details
Applies spectral normalization for training stability
Outputs a feature map rather than a single real/fake prediction

This discriminator design enables the model to:

Distinguish between real and generated images at multiple scales
Provide more informative gradients for generator training
Maintain stable adversarial training

Inference Features

The trained Real-ESRGAN models support practical features for deployment:

Tile processing: Handle arbitrarily large images by processing in tiles
Alpha channel support: Preserve transparency in RGBA images
Grayscale images: Process both color and grayscale inputs
16-bit images: Support high bit-depth images
Arbitrary output scales: Use --outscale to generate any desired output size
Face enhancement: Optional integration with GFPGAN for face restoration

The inference implementation automatically handles images with different characteristics, making it practical for diverse real-world applications.

Why It Works

Real-ESRGAN’s effectiveness stems from several key design choices:

Comprehensive degradation modeling: The high-order degradation process covers the vast majority of real-world scenarios
Strong generator architecture: Both RRDBNet and SRVGGNetCompact provide sufficient capacity to learn complex mappings
Advanced discriminator: The U-Net discriminator provides rich multi-scale feedback
Two-stage training: Separate L1 and GAN training stages balance sharpness and perceptual quality
Domain-specific variants: Specialized models for anime, faces, and general content maximize performance

The result is a practical blind super-resolution system that works on real-world images without requiring any real degraded training data.

Get Started

Core Concepts

Usage Guides

Training

Models

Resources

Blind Super-Resolution Challenge

Synthetic Data Degradation Pipeline

First-order Degradation

Second-order Degradation

Sinc Filters

High-Order Degradation Modeling

Pure Synthetic Training

Network Architecture Strategy

RRDBNet (Large Models)

SRVGGNetCompact (Lightweight Models)

Discriminator Architecture

Inference Features

Why It Works

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guides

Training

Models

Resources

​Blind Super-Resolution Challenge

​Synthetic Data Degradation Pipeline

​First-order Degradation

​Second-order Degradation

​Sinc Filters

​High-Order Degradation Modeling

​Pure Synthetic Training

​Network Architecture Strategy

​RRDBNet (Large Models)

​SRVGGNetCompact (Lightweight Models)

​Discriminator Architecture

​Inference Features

​Why It Works

Build docs developers (and LLMs) love

Blind Super-Resolution Challenge

Synthetic Data Degradation Pipeline

First-order Degradation

Second-order Degradation

Sinc Filters

High-Order Degradation Modeling

Pure Synthetic Training

Network Architecture Strategy

RRDBNet (Large Models)

SRVGGNetCompact (Lightweight Models)

Discriminator Architecture

Inference Features

Why It Works