Generator Architectures
Real-ESRGAN offers two generator networks with different trade-offs between quality and efficiency.RRDBNet Architecture
The RRDBNet (Residual-in-Residual Dense Block Network) is the primary architecture for high-quality super-resolution tasks, inherited from ESRGAN.Architecture Parameters
Based on the source code ininference_realesrgan.py, Real-ESRGAN uses the following RRDBNet configurations:
RealESRGAN_x4plus / RealESRNet_x4plus (Standard Model)
Network Structure
RRDBNet is built on a “Residual-in-Residual” design where dense blocks are nested within residual connections, enabling very deep networks with stable gradient flow.
- Initial Convolution: Extracts base features from the input image
- RRDB Trunk: Stack of Residual-in-Residual Dense Blocks
- Trunk Convolution: Processes accumulated features
- Global Residual Connection: Adds input features to output of trunk
- Upsampling Layers: Pixel shuffle layers for resolution increase
- Final Convolution: Produces output RGB image
Residual-in-Residual Dense Block (RRDB)
Each RRDB contains multiple dense blocks with residual scaling:- Dense Block: Each layer connects to all subsequent layers
- Local Residual: Within each dense block
- Global Residual: Across the entire RRDB
- Residual Scaling: Uses β scaling factor (typically 0.2) for stable training
Model Sizes and Complexity
| Model | RRDB Blocks | Parameters | Use Case |
|---|---|---|---|
| RealESRGAN_x4plus | 23 | ~16.7M | General images, highest quality |
| RealESRGAN_x4plus_anime_6B | 6 | ~6M | Anime images, balanced quality/size |
| RealESRNet_x4plus | 23 | ~16.7M | Pre-GAN training, slightly softer |
The anime model with only 6 RRDB blocks achieves excellent results for anime content while being significantly smaller, demonstrating that model capacity requirements vary by domain.
SRVGGNetCompact Architecture
SRVGGNetCompact is a lightweight VGG-style architecture designed for fast inference with minimal memory footprint.Architecture Parameters
Fromrealesrgan/archs/srvgg_arch.py and inference_realesrgan.py:
realesr-animevideov3 (Extra Small Model)
Network Structure
The SRVGGNetCompact architecture is intentionally simple:Key Design Principles
Key Design Principles
Compact Design
- No dense connections or complex residual structures
- All convolutions operate on the low-resolution feature space
- Upsampling performed only at the final layer
- Adds nearest-neighbor upsampled input to output
- Network learns the residual/difference rather than the full output
- Simplifies learning and improves convergence
- Minimal memory footprint during inference
- Fast processing suitable for video applications
- Significantly fewer parameters than RRDBNet
Activation Functions
SRVGGNetCompact supports three activation types:- PReLU (default): Learnable negative slope, most common choice
- ReLU: Simple and fast, zero for negative values
- LeakyReLU: Fixed small negative slope (0.1)
Model Comparison
| Model | Conv Layers | Parameters | Speed | Use Case |
|---|---|---|---|---|
| realesr-animevideov3 | 16 | ~600K | Very Fast | Anime videos |
| realesr-general-x4v3 | 32 | ~1.2M | Fast | General images |
| RealESRGAN_x4plus (RRDBNet) | - | ~16.7M | Slower | Highest quality |
SRVGGNetCompact is approximately 28x smaller and several times faster than RRDBNet while still producing high-quality results for appropriate content types.
Discriminator Architecture
Real-ESRGAN uses a U-Net discriminator with spectral normalization for stable adversarial training.UNetDiscriminatorSN
Fromrealesrgan/archs/discriminator_arch.py:
Architecture Details
The discriminator follows a U-Net structure with downsampling and upsampling paths:Downsampling Path
Upsampling Path
Output Head
Key Features
Spectral Normalization: Applied to all convolutional layers (except first and last) to stabilize GAN training by constraining the Lipschitz constant of the discriminator.
- Capture both local and global information
- Provide feedback at multiple resolutions
- Preserve fine details through skip connections
Architecture Selection Guide
When to Use Each Architecture
When to Use Each Architecture
RRDBNet (23 blocks)
- Professional photo enhancement
- Maximum quality is required
- Batch processing with GPU available
- File size and speed are not critical
- Anime and illustration upscaling
- Balance between quality and efficiency
- Moderate computational resources
- General-purpose fast upscaling
- Real-time or near-real-time requirements
- CPU inference or limited GPU memory
- Video super-resolution
- Anime/cartoon video processing
- Maximum speed and minimal memory
- Mobile or edge deployment
Model Files
Pre-trained model weights are available for all architectures:- RealESRGAN_x4plus.pth: 4x RRDBNet for general images
- RealESRNet_x4plus.pth: 4x RRDBNet before GAN training
- RealESRGAN_x4plus_anime_6B.pth: 4x RRDBNet (6 blocks) for anime
- RealESRGAN_x2plus.pth: 2x RRDBNet for moderate upscaling
- realesr-animevideov3.pth: 4x SRVGGNetCompact for anime videos
- realesr-general-x4v3.pth: 4x SRVGGNetCompact for general use