AI Upscaling

Frame integrates Real-ESRGAN for AI-powered video upscaling, allowing you to increase video resolution while preserving and enhancing quality using machine learning models.

Overview

AI upscaling uses the Real-ESRGAN neural network to intelligently upscale video frames. Unlike traditional scaling algorithms (bicubic, lanczos), AI upscaling:

Reconstructs details rather than interpolating pixels
Reduces compression artifacts
Enhances edges and textures
Produces sharper results, especially for anime and animation

Frame uses the realesr-animevideov3 models optimized for video and animation content. These models run locally on your GPU using Vulkan (NVIDIA, AMD) or Metal (Apple Silicon).

Upscale Modes

Frame supports two upscaling modes:

config.mlUpscale = 'none';       // Disabled (default)
config.mlUpscale = 'esrgan-2x';  // 2x upscale
config.mlUpscale = 'esrgan-4x';  // 4x upscale

2x Upscale (`esrgan-2x`)

Doubles resolution in both dimensions (4x pixel count)
Example: 1920x1080 → 3840x2160 (1080p → 4K)
Model: realesr-animevideov3-x2
Faster processing, lower VRAM usage

4x Upscale (`esrgan-4x`)

Quadruples resolution in both dimensions (16x pixel count)
Example: 960x540 → 3840x2160 (540p → 4K)
Model: realesr-animevideov3-x4
Slower processing, higher VRAM usage

Mode resolution (upscale.rs:136-145):

pub(crate) fn resolve_upscale_mode(
    mode: &str,
) -> Result<(&'static str, &'static str), ConversionError> {
    match mode {
        "esrgan-2x" => Ok(("2", "realesr-animevideov3-x2")),
        "esrgan-4x" => Ok(("4", "realesr-animevideov3-x4")),
        _ => Err(ConversionError::InvalidInput(
            format!("Invalid upscale mode: {}", mode)
        )),
    }
}

How AI Upscaling Works

The AI upscaling process involves three stages:

1. Frame Extraction (Decode)

FFmpeg extracts individual frames from the source video as PNG images: Implementation (upscale.rs:356-373):

// Apply video filters (crop, rotate, flip) before extraction
let video_filters = build_video_filters(&task.config, false);
if !video_filters.is_empty() {
    dec_args.push("-vf".to_string());
    dec_args.push(video_filters.join(","));
}

// Force constant framerate to prevent drift
dec_args.push("-r".to_string());
dec_args.push(fps.to_string());
dec_args.push("-vsync".to_string());
dec_args.push("cfr".to_string());

dec_args.push(
    input_frames_dir.join("frame_%08d.png").to_string_lossy().to_string(),
);

2. AI Upscaling (Real-ESRGAN)

The realesrgan-ncnn-vulkan sidecar processes frames in batches: Implementation (upscale.rs:456-480):

let upscaler_args = vec![
    "-v".to_string(),
    "-i".to_string(),
    sanitize_external_tool_path(&input_frames_dir),
    "-o".to_string(),
    sanitize_external_tool_path(&output_frames_dir),
    "-s".to_string(),
    scale.to_string(),  // "2" or "4"
    "-f".to_string(),
    "png".to_string(),
    "-m".to_string(),
    sanitize_external_tool_path(&models_path),
    "-n".to_string(),
    model_name.to_string(),  // "realesr-animevideov3-x2" or "x4"
    "-j".to_string(),
    compute_upscale_threads(...),  // Dynamic thread configuration
    "-g".to_string(),
    "0".to_string(),  // GPU ID
    "-t".to_string(),
    "0".to_string(),  // Tile size (auto)
];

3. Re-encoding (Encode)

FFmpeg combines upscaled frames with original audio and re-encodes: Implementation (upscale.rs:22-133):

pub(crate) fn build_upscale_encode_args(
    output_frames_dir: &Path,
    source_file_path: &str,
    output_path: &str,
    source_fps: f64,
    config: &ConversionConfig,
    pixel_format: Option<String>,
) -> Vec<String> {
    let mut enc_args = vec![
        "-framerate".to_string(),
        source_fps.to_string(),
        "-start_number".to_string(),
        "1".to_string(),
        "-i".to_string(),
        output_frames_dir.join("frame_%08d.png").to_string_lossy().to_string(),
    ];

    // Second input: original file for audio/metadata
    enc_args.push("-i".to_string());
    enc_args.push(source_file_path.to_string());

    // Map upscaled video from first input
    enc_args.push("-map".to_string());
    enc_args.push("0:v:0".to_string());

    // Map audio from original file
    enc_args.push("-map".to_string());
    enc_args.push("1:a?".to_string());

    // Apply video codec settings
    add_video_codec_args(&mut enc_args, config);
    add_audio_codec_args(&mut enc_args, config);
    
    // ...
}

Performance Optimization

Dynamic Thread Configuration

Frame automatically configures concurrent processing threads based on output resolution to optimize GPU VRAM usage: Implementation (upscale.rs:148-171):

pub(crate) fn compute_upscale_threads(source_width: u32, source_height: u32, scale: u32) -> String {
    let output_pixels = (source_width as u64 * scale as u64) * (source_height as u64 * scale as u64);

    // proc: concurrent GPU inference frames — limited by VRAM
    // > 4K output (~8.3M px): ~500MB+ per frame → single concurrent frame
    // > 1080p output (~2M px): moderate pressure → 2 concurrent frames
    // ≤ 1080p output: lightweight, pipeline benefits from concurrency → 4
    let proc = if output_pixels > 8_294_400 {
        1
    } else if output_pixels > 2_073_600 {
        2
    } else {
        4
    };

    // load/save: I/O threads — limited by CPU cores
    let cpus = std::thread::available_parallelism().map(|n| n.get() as u32).unwrap_or(4);
    let io = cpus.div_ceil(2).clamp(1, 4);

    format!("{}:{}:{}", io, proc, io)
}

Thread format: load_threads:proc_threads:save_threads

Hardware Decode Acceleration

Enable hardware decoding during frame extraction for faster processing:

config.hwDecode = true;
config.mlUpscale = 'esrgan-2x';

Implementation (upscale.rs:313-322):

if task.config.hw_decode {
    if crate::conversion::utils::is_nvenc_codec(&task.config.video_codec) {
        dec_args.push("-hwaccel".to_string());
        dec_args.push("cuda".to_string());
    } else if crate::conversion::utils::is_videotoolbox_codec(&task.config.video_codec) {
        dec_args.push("-hwaccel".to_string());
        dec_args.push("videotoolbox".to_string());
    }
}

Progress Tracking

AI upscaling provides detailed progress updates through three phases: Progress distribution:

0-5%: Frame extraction (decode)
5-90%: AI upscaling
90-100%: Re-encoding

Implementation (upscale.rs:404-416, 531-542):

// Decode progress
let decode_progress = (current_frame as f64 / total_frames as f64) * 5.0;
app_clone.emit("conversion-progress", ProgressPayload {
    id: id_clone.clone(),
    progress: decode_progress.min(5.0),
});

// Upscale progress
let progress = 5.0 + (completed_frames as f64 / total_frames as f64) * 85.0;
app_clone.emit("conversion-progress", ProgressPayload {
    id: id_clone.clone(),
    progress: progress.min(90.0),
});

// Encode progress
let encode_progress = 90.0 + (current_frame as f64 / total_frames as f64) * 10.0;

System Requirements

GPU Requirements

Real-ESRGAN requires GPU acceleration via: Windows/Linux:

NVIDIA GPU with Vulkan support
AMD GPU with Vulkan support
Vulkan drivers installed

macOS:

Apple Silicon (M1/M2/M3) with Metal support
Intel Mac with Metal-capable GPU

VRAM Requirements

Minimum VRAM by output resolution:

Output Resolution	2x Mode	4x Mode
1080p (2MP)	2 GB	4 GB
1440p (3.7MP)	3 GB	6 GB
4K (8.3MP)	4 GB	8 GB
8K (33MP)	8 GB	16 GB

Insufficient VRAM will cause the upscaling process to fail. Frame automatically reduces concurrent processing for higher resolutions to minimize VRAM usage.

Runtime Validation

Frame validates the upscaling runtime before starting conversion: Implementation (upscale.rs:173-236):

pub(crate) async fn validate_upscale_runtime(
    app: &AppHandle,
    mode: &str,
) -> Result<(), ConversionError> {
    let (_, model_name) = resolve_upscale_mode(mode)?;

    // Check model files exist
    let models_path = app.path().resolve("resources/models", BaseDirectory::Resource)?;
    let model_param = models_path.join(format!("{}.param", model_name));
    let model_bin = models_path.join(format!("{}.bin", model_name));

    if !model_param.is_file() || !model_bin.is_file() {
        return Err(ConversionError::InvalidInput(
            format!("ML upscaling models are missing for '{}'. Run `bun run setup:upscaler` and rebuild.", mode)
        ));
    }

    // Test upscaler sidecar
    let output = app.shell().sidecar("realesrgan-ncnn-vulkan")?
        .args(["-h"])
        .output()
        .await?;

    if !output.status.success() {
        // Validate help output to confirm binary works
        // ...
    }

    Ok(())
}

Limitations and Constraints

Container Compatibility

// AI upscaling requires video container (not audio-only or GIF)
config.container = 'mp4'; // ✓ Supported
config.container = 'mkv'; // ✓ Supported
config.container = 'webm'; // ✓ Supported
config.container = 'mov'; // ✓ Supported
config.container = 'mp3'; // ✗ Not supported (audio-only)
config.container = 'gif'; // ✗ Not supported (video-only)

Validation (args.rs:566-570):

if (is_audio_only || is_video_only) && has_ml_upscale {
    return Err(ConversionError::InvalidInput(
        "ML upscaling requires an audio-capable video container".to_string(),
    ));
}

Processing Mode

AI upscaling requires re-encoding mode:

config.processingMode = 'reencode'; // ✓ Required
config.processingMode = 'copy';     // ✗ Not compatible

Validation (args.rs:578-582):

if is_copy_mode && has_ml_upscale {
    return Err(ConversionError::InvalidInput(
        "ML upscaling requires re-encoding mode".to_string(),
    ));
}

Combining with Other Features

AI upscaling works seamlessly with other Frame features:

With Video Filters

{
  mlUpscale: 'esrgan-2x',
  rotation: '90',        // Applied before upscaling
  flipHorizontal: true,  // Applied before upscaling
  crop: { enabled: true, x: 0, y: 0, width: 1280, height: 720 }
}

Filters are applied during frame extraction (upscale.rs:356-360).

With Hardware Encoding

{
  mlUpscale: 'esrgan-4x',
  videoCodec: 'h264_nvenc',
  hwDecode: true,
  nvencSpatialAq: true
}

Combines GPU upscaling with GPU encoding for maximum performance.

With Audio Processing

{
  mlUpscale: 'esrgan-2x',
  audioNormalize: true,
  audioVolume: 110,
  selectedAudioTracks: [0]
}

Audio processing is independent of upscaling.

Example Configurations

Upscale to 4K

{
  container: 'mp4',
  videoCodec: 'libx265',
  videoBitrateMode: 'crf',
  crf: 20,
  mlUpscale: 'esrgan-2x', // 1080p → 4K
  preset: 'medium',
  audioCodec: 'aac',
  audioBitrate: '192'
}

Fast Hardware-Accelerated Upscale

{
  container: 'mp4',
  videoCodec: 'h264_nvenc',
  videoBitrateMode: 'crf',
  quality: 50,
  mlUpscale: 'esrgan-2x',
  hwDecode: true,
  nvencSpatialAq: true,
  preset: 'fast'
}

Archive Quality with 4x Upscale

{
  container: 'mkv',
  videoCodec: 'libx265',
  videoBitrateMode: 'crf',
  crf: 18,
  mlUpscale: 'esrgan-4x', // 540p → 4K
  preset: 'slow',
  audioCodec: 'flac'
}

Troubleshooting

Models are missing error

Run the setup script to download Real-ESRGAN models:

bun run setup:upscaler

Then rebuild the application to bundle the models.

Upscaler sidecar unavailable

Ensure the realesrgan-ncnn-vulkan binary is included in the build:

Check tauri.conf.json includes the sidecar
Verify binary has execute permissions on Linux/macOS
On Windows, check antivirus isn’t blocking the executable

Out of memory / VRAM errors

The output resolution exceeds available VRAM. Try:

Use 2x instead of 4x mode
Upscale at lower source resolution
Close other GPU-intensive applications
Upgrade GPU memory

Vulkan/Metal errors

Graphics driver issues:

Update GPU drivers to latest version
On Linux, ensure Vulkan loader and drivers are installed
On macOS, ensure system is up to date

Video Conversion

Configure resolution and scaling algorithms

Batch Processing

Upscale multiple videos efficiently

Presets

Save upscaling configurations

Get Started

Features

Guides

Development

AI Upscaling

Overview

Upscale Modes

2x Upscale (`esrgan-2x`)

4x Upscale (`esrgan-4x`)

How AI Upscaling Works

1. Frame Extraction (Decode)

2. AI Upscaling (Real-ESRGAN)

3. Re-encoding (Encode)

Performance Optimization

Dynamic Thread Configuration

Hardware Decode Acceleration

Progress Tracking

System Requirements

GPU Requirements

VRAM Requirements

Runtime Validation

Limitations and Constraints

Container Compatibility

Processing Mode

Combining with Other Features

With Video Filters

With Hardware Encoding

With Audio Processing

Example Configurations

Upscale to 4K

Fast Hardware-Accelerated Upscale

Archive Quality with 4x Upscale

Troubleshooting

Video Conversion

Batch Processing

Presets

Build docs developers (and LLMs) love

Get Started

Features

Guides

Development

​Overview

​Upscale Modes

​2x Upscale (esrgan-2x)

​4x Upscale (esrgan-4x)

​How AI Upscaling Works

​1. Frame Extraction (Decode)

​2. AI Upscaling (Real-ESRGAN)

​3. Re-encoding (Encode)

​Performance Optimization

​Dynamic Thread Configuration

​Hardware Decode Acceleration

​Progress Tracking

​System Requirements

​GPU Requirements

​VRAM Requirements

​Runtime Validation

​Limitations and Constraints

​Container Compatibility

​Processing Mode

​Combining with Other Features

​With Video Filters

​With Hardware Encoding

​With Audio Processing

​Example Configurations

​Upscale to 4K

​Fast Hardware-Accelerated Upscale

​Archive Quality with 4x Upscale

​Troubleshooting

​Related Features

Video Conversion

Batch Processing

Presets

Build docs developers (and LLMs) love

Overview

Upscale Modes

2x Upscale (`esrgan-2x`)

4x Upscale (`esrgan-4x`)

How AI Upscaling Works

1. Frame Extraction (Decode)

2. AI Upscaling (Real-ESRGAN)

3. Re-encoding (Encode)

Performance Optimization

Dynamic Thread Configuration

Hardware Decode Acceleration

Progress Tracking

System Requirements

GPU Requirements

VRAM Requirements

Runtime Validation

Limitations and Constraints

Container Compatibility

Processing Mode

Combining with Other Features

With Video Filters

With Hardware Encoding

With Audio Processing

Example Configurations

Upscale to 4K

Fast Hardware-Accelerated Upscale

Archive Quality with 4x Upscale

Troubleshooting

Related Features