Skip to main content
Frame integrates Real-ESRGAN for AI-powered video upscaling, allowing you to increase video resolution while preserving and enhancing quality using machine learning models.

Overview

AI upscaling uses the Real-ESRGAN neural network to intelligently upscale video frames. Unlike traditional scaling algorithms (bicubic, lanczos), AI upscaling:
  • Reconstructs details rather than interpolating pixels
  • Reduces compression artifacts
  • Enhances edges and textures
  • Produces sharper results, especially for anime and animation
Frame uses the realesr-animevideov3 models optimized for video and animation content. These models run locally on your GPU using Vulkan (NVIDIA, AMD) or Metal (Apple Silicon).

Upscale Modes

Frame supports two upscaling modes:
config.mlUpscale = 'none';       // Disabled (default)
config.mlUpscale = 'esrgan-2x';  // 2x upscale
config.mlUpscale = 'esrgan-4x';  // 4x upscale

2x Upscale (esrgan-2x)

  • Doubles resolution in both dimensions (4x pixel count)
  • Example: 1920x1080 → 3840x2160 (1080p → 4K)
  • Model: realesr-animevideov3-x2
  • Faster processing, lower VRAM usage

4x Upscale (esrgan-4x)

  • Quadruples resolution in both dimensions (16x pixel count)
  • Example: 960x540 → 3840x2160 (540p → 4K)
  • Model: realesr-animevideov3-x4
  • Slower processing, higher VRAM usage
Mode resolution (upscale.rs:136-145):
pub(crate) fn resolve_upscale_mode(
    mode: &str,
) -> Result<(&'static str, &'static str), ConversionError> {
    match mode {
        "esrgan-2x" => Ok(("2", "realesr-animevideov3-x2")),
        "esrgan-4x" => Ok(("4", "realesr-animevideov3-x4")),
        _ => Err(ConversionError::InvalidInput(
            format!("Invalid upscale mode: {}", mode)
        )),
    }
}

How AI Upscaling Works

The AI upscaling process involves three stages:

1. Frame Extraction (Decode)

FFmpeg extracts individual frames from the source video as PNG images: Implementation (upscale.rs:356-373):
// Apply video filters (crop, rotate, flip) before extraction
let video_filters = build_video_filters(&task.config, false);
if !video_filters.is_empty() {
    dec_args.push("-vf".to_string());
    dec_args.push(video_filters.join(","));
}

// Force constant framerate to prevent drift
dec_args.push("-r".to_string());
dec_args.push(fps.to_string());
dec_args.push("-vsync".to_string());
dec_args.push("cfr".to_string());

dec_args.push(
    input_frames_dir.join("frame_%08d.png").to_string_lossy().to_string(),
);

2. AI Upscaling (Real-ESRGAN)

The realesrgan-ncnn-vulkan sidecar processes frames in batches: Implementation (upscale.rs:456-480):
let upscaler_args = vec![
    "-v".to_string(),
    "-i".to_string(),
    sanitize_external_tool_path(&input_frames_dir),
    "-o".to_string(),
    sanitize_external_tool_path(&output_frames_dir),
    "-s".to_string(),
    scale.to_string(),  // "2" or "4"
    "-f".to_string(),
    "png".to_string(),
    "-m".to_string(),
    sanitize_external_tool_path(&models_path),
    "-n".to_string(),
    model_name.to_string(),  // "realesr-animevideov3-x2" or "x4"
    "-j".to_string(),
    compute_upscale_threads(...),  // Dynamic thread configuration
    "-g".to_string(),
    "0".to_string(),  // GPU ID
    "-t".to_string(),
    "0".to_string(),  // Tile size (auto)
];

3. Re-encoding (Encode)

FFmpeg combines upscaled frames with original audio and re-encodes: Implementation (upscale.rs:22-133):
pub(crate) fn build_upscale_encode_args(
    output_frames_dir: &Path,
    source_file_path: &str,
    output_path: &str,
    source_fps: f64,
    config: &ConversionConfig,
    pixel_format: Option<String>,
) -> Vec<String> {
    let mut enc_args = vec![
        "-framerate".to_string(),
        source_fps.to_string(),
        "-start_number".to_string(),
        "1".to_string(),
        "-i".to_string(),
        output_frames_dir.join("frame_%08d.png").to_string_lossy().to_string(),
    ];

    // Second input: original file for audio/metadata
    enc_args.push("-i".to_string());
    enc_args.push(source_file_path.to_string());

    // Map upscaled video from first input
    enc_args.push("-map".to_string());
    enc_args.push("0:v:0".to_string());

    // Map audio from original file
    enc_args.push("-map".to_string());
    enc_args.push("1:a?".to_string());

    // Apply video codec settings
    add_video_codec_args(&mut enc_args, config);
    add_audio_codec_args(&mut enc_args, config);
    
    // ...
}

Performance Optimization

Dynamic Thread Configuration

Frame automatically configures concurrent processing threads based on output resolution to optimize GPU VRAM usage: Implementation (upscale.rs:148-171):
pub(crate) fn compute_upscale_threads(source_width: u32, source_height: u32, scale: u32) -> String {
    let output_pixels = (source_width as u64 * scale as u64) * (source_height as u64 * scale as u64);

    // proc: concurrent GPU inference frames — limited by VRAM
    // > 4K output (~8.3M px): ~500MB+ per frame → single concurrent frame
    // > 1080p output (~2M px): moderate pressure → 2 concurrent frames
    // ≤ 1080p output: lightweight, pipeline benefits from concurrency → 4
    let proc = if output_pixels > 8_294_400 {
        1
    } else if output_pixels > 2_073_600 {
        2
    } else {
        4
    };

    // load/save: I/O threads — limited by CPU cores
    let cpus = std::thread::available_parallelism().map(|n| n.get() as u32).unwrap_or(4);
    let io = cpus.div_ceil(2).clamp(1, 4);

    format!("{}:{}:{}", io, proc, io)
}
Thread format: load_threads:proc_threads:save_threads

Hardware Decode Acceleration

Enable hardware decoding during frame extraction for faster processing:
config.hwDecode = true;
config.mlUpscale = 'esrgan-2x';
Implementation (upscale.rs:313-322):
if task.config.hw_decode {
    if crate::conversion::utils::is_nvenc_codec(&task.config.video_codec) {
        dec_args.push("-hwaccel".to_string());
        dec_args.push("cuda".to_string());
    } else if crate::conversion::utils::is_videotoolbox_codec(&task.config.video_codec) {
        dec_args.push("-hwaccel".to_string());
        dec_args.push("videotoolbox".to_string());
    }
}

Progress Tracking

AI upscaling provides detailed progress updates through three phases: Progress distribution:
  • 0-5%: Frame extraction (decode)
  • 5-90%: AI upscaling
  • 90-100%: Re-encoding
Implementation (upscale.rs:404-416, 531-542):
// Decode progress
let decode_progress = (current_frame as f64 / total_frames as f64) * 5.0;
app_clone.emit("conversion-progress", ProgressPayload {
    id: id_clone.clone(),
    progress: decode_progress.min(5.0),
});

// Upscale progress
let progress = 5.0 + (completed_frames as f64 / total_frames as f64) * 85.0;
app_clone.emit("conversion-progress", ProgressPayload {
    id: id_clone.clone(),
    progress: progress.min(90.0),
});

// Encode progress
let encode_progress = 90.0 + (current_frame as f64 / total_frames as f64) * 10.0;

System Requirements

GPU Requirements

Real-ESRGAN requires GPU acceleration via: Windows/Linux:
  • NVIDIA GPU with Vulkan support
  • AMD GPU with Vulkan support
  • Vulkan drivers installed
macOS:
  • Apple Silicon (M1/M2/M3) with Metal support
  • Intel Mac with Metal-capable GPU

VRAM Requirements

Minimum VRAM by output resolution:
Output Resolution2x Mode4x Mode
1080p (2MP)2 GB4 GB
1440p (3.7MP)3 GB6 GB
4K (8.3MP)4 GB8 GB
8K (33MP)8 GB16 GB
Insufficient VRAM will cause the upscaling process to fail. Frame automatically reduces concurrent processing for higher resolutions to minimize VRAM usage.

Runtime Validation

Frame validates the upscaling runtime before starting conversion: Implementation (upscale.rs:173-236):
pub(crate) async fn validate_upscale_runtime(
    app: &AppHandle,
    mode: &str,
) -> Result<(), ConversionError> {
    let (_, model_name) = resolve_upscale_mode(mode)?;

    // Check model files exist
    let models_path = app.path().resolve("resources/models", BaseDirectory::Resource)?;
    let model_param = models_path.join(format!("{}.param", model_name));
    let model_bin = models_path.join(format!("{}.bin", model_name));

    if !model_param.is_file() || !model_bin.is_file() {
        return Err(ConversionError::InvalidInput(
            format!("ML upscaling models are missing for '{}'. Run `bun run setup:upscaler` and rebuild.", mode)
        ));
    }

    // Test upscaler sidecar
    let output = app.shell().sidecar("realesrgan-ncnn-vulkan")?
        .args(["-h"])
        .output()
        .await?;

    if !output.status.success() {
        // Validate help output to confirm binary works
        // ...
    }

    Ok(())
}

Limitations and Constraints

Container Compatibility

// AI upscaling requires video container (not audio-only or GIF)
config.container = 'mp4'; // ✓ Supported
config.container = 'mkv'; // ✓ Supported
config.container = 'webm'; // ✓ Supported
config.container = 'mov'; // ✓ Supported
config.container = 'mp3'; // ✗ Not supported (audio-only)
config.container = 'gif'; // ✗ Not supported (video-only)
Validation (args.rs:566-570):
if (is_audio_only || is_video_only) && has_ml_upscale {
    return Err(ConversionError::InvalidInput(
        "ML upscaling requires an audio-capable video container".to_string(),
    ));
}

Processing Mode

AI upscaling requires re-encoding mode:
config.processingMode = 'reencode'; // ✓ Required
config.processingMode = 'copy';     // ✗ Not compatible
Validation (args.rs:578-582):
if is_copy_mode && has_ml_upscale {
    return Err(ConversionError::InvalidInput(
        "ML upscaling requires re-encoding mode".to_string(),
    ));
}

Combining with Other Features

AI upscaling works seamlessly with other Frame features:

With Video Filters

{
  mlUpscale: 'esrgan-2x',
  rotation: '90',        // Applied before upscaling
  flipHorizontal: true,  // Applied before upscaling
  crop: { enabled: true, x: 0, y: 0, width: 1280, height: 720 }
}
Filters are applied during frame extraction (upscale.rs:356-360).

With Hardware Encoding

{
  mlUpscale: 'esrgan-4x',
  videoCodec: 'h264_nvenc',
  hwDecode: true,
  nvencSpatialAq: true
}
Combines GPU upscaling with GPU encoding for maximum performance.

With Audio Processing

{
  mlUpscale: 'esrgan-2x',
  audioNormalize: true,
  audioVolume: 110,
  selectedAudioTracks: [0]
}
Audio processing is independent of upscaling.

Example Configurations

Upscale to 4K

{
  container: 'mp4',
  videoCodec: 'libx265',
  videoBitrateMode: 'crf',
  crf: 20,
  mlUpscale: 'esrgan-2x', // 1080p → 4K
  preset: 'medium',
  audioCodec: 'aac',
  audioBitrate: '192'
}

Fast Hardware-Accelerated Upscale

{
  container: 'mp4',
  videoCodec: 'h264_nvenc',
  videoBitrateMode: 'crf',
  quality: 50,
  mlUpscale: 'esrgan-2x',
  hwDecode: true,
  nvencSpatialAq: true,
  preset: 'fast'
}

Archive Quality with 4x Upscale

{
  container: 'mkv',
  videoCodec: 'libx265',
  videoBitrateMode: 'crf',
  crf: 18,
  mlUpscale: 'esrgan-4x', // 540p → 4K
  preset: 'slow',
  audioCodec: 'flac'
}

Troubleshooting

Run the setup script to download Real-ESRGAN models:
bun run setup:upscaler
Then rebuild the application to bundle the models.
Ensure the realesrgan-ncnn-vulkan binary is included in the build:
  • Check tauri.conf.json includes the sidecar
  • Verify binary has execute permissions on Linux/macOS
  • On Windows, check antivirus isn’t blocking the executable
The output resolution exceeds available VRAM. Try:
  • Use 2x instead of 4x mode
  • Upscale at lower source resolution
  • Close other GPU-intensive applications
  • Upgrade GPU memory
Graphics driver issues:
  • Update GPU drivers to latest version
  • On Linux, ensure Vulkan loader and drivers are installed
  • On macOS, ensure system is up to date

Video Conversion

Configure resolution and scaling algorithms

Batch Processing

Upscale multiple videos efficiently

Presets

Save upscaling configurations

Build docs developers (and LLMs) love