Skip to main content

Overview

AugmentationCfg defines data augmentation parameters for training image transforms. It controls random augmentations like resized cropping, color jitter, and random erasing.

Class Definition

@dataclass
class AugmentationCfg:
    scale: Tuple[float, float] = (0.9, 1.0)
    ratio: Optional[Tuple[float, float]] = None
    color_jitter: Optional[Union[float, Tuple[float, float, float], Tuple[float, float, float, float]]] = None
    re_prob: Optional[float] = None
    re_count: Optional[int] = None
    use_timm: bool = False
    color_jitter_prob: float = None
    gray_scale_prob: float = None

Fields

scale
Tuple[float, float]
default:"(0.9, 1.0)"
Range of size of the random crop relative to the original image size. Used in RandomResizedCrop.
  • First value: minimum crop scale (e.g., 0.08 = crop can be 8% of original)
  • Second value: maximum crop scale (e.g., 1.0 = crop can be 100% of original)
Common values:
  • (0.08, 1.0): Standard ImageNet training
  • (0.9, 1.0): Light augmentation
ratio
Tuple[float, float]
default:"None"
Range of aspect ratio of the random crop. Used in RandomResizedCrop.
  • First value: minimum aspect ratio (e.g., 0.75 = 3:4)
  • Second value: maximum aspect ratio (e.g., 1.33 = 4:3)
If None, defaults to (3/4, 4/3) in torchvision.
color_jitter
Union[float, Tuple[float, ...]]
default:"None"
Color jitter augmentation strength. Can be specified as:
  • float: Applied to brightness, contrast, saturation (e.g., 0.4)
  • Tuple[float, float, float]: (brightness, contrast, saturation)
  • Tuple[float, float, float, float]: (brightness, contrast, saturation, hue)
Values are typically in range [0, 1]. Higher values = stronger augmentation.Example: (0.4, 0.4, 0.4, 0.1) = moderate jitter with slight hue variation
re_prob
float
default:"None"
Random erasing probability. Probability of applying random erasing augmentation.
  • 0.0: No random erasing
  • 0.25: 25% chance of erasing per image
  • 1.0: Always apply random erasing
Requires use_timm=True.
re_count
int
default:"None"
Number of random erasing operations per image when random erasing is applied.Requires use_timm=True.
use_timm
bool
default:"False"
Whether to use timm (PyTorch Image Models) augmentation transforms.When True, enables advanced augmentations from timm:
  • RandAugment
  • Random erasing
  • More sophisticated augmentation pipelines
When False, uses simple torchvision-based augmentations.
color_jitter_prob
float
default:"None"
Probability of applying color jitter when use_timm=False.
  • 0.0: Never apply color jitter
  • 0.8: Apply color jitter 80% of the time (common default)
  • 1.0: Always apply color jitter
Only used when use_timm=False.
gray_scale_prob
float
default:"None"
Probability of converting image to grayscale (with 3 channels) when use_timm=False.
  • 0.0: Never grayscale
  • 0.2: 20% chance of grayscale (common default)
  • 1.0: Always grayscale
Only used when use_timm=False.

Examples

Standard ImageNet augmentation

from open_clip import AugmentationCfg, PreprocessCfg, image_transform_v2

# ImageNet-style training augmentation
aug_cfg = AugmentationCfg(
    scale=(0.08, 1.0),
    ratio=(0.75, 1.33),
    color_jitter=(0.4, 0.4, 0.4, 0.1),
    color_jitter_prob=0.8,
    gray_scale_prob=0.2
)

preprocess_cfg = PreprocessCfg(size=224)
train_transform = image_transform_v2(
    cfg=preprocess_cfg,
    is_train=True,
    aug_cfg=aug_cfg
)

Light augmentation

# Minimal augmentation for fine-tuning
aug_cfg = AugmentationCfg(
    scale=(0.9, 1.0),  # Only small crops
    color_jitter=0.2,   # Light color jitter
    color_jitter_prob=0.5
)

Strong augmentation with timm

# Advanced augmentation with timm
aug_cfg = AugmentationCfg(
    scale=(0.08, 1.0),
    color_jitter=0.4,
    re_prob=0.25,      # Random erasing
    re_count=1,
    use_timm=True      # Enable timm augmentations
)

No augmentation

# Training without augmentation (only random crop)
aug_cfg = AugmentationCfg(
    scale=(1.0, 1.0),  # No scale variation
    ratio=None,
    color_jitter=None
)

Custom aspect ratio range

# Allow more extreme aspect ratios
aug_cfg = AugmentationCfg(
    scale=(0.5, 1.0),
    ratio=(0.5, 2.0),  # From 1:2 to 2:1
    color_jitter=0.3
)

Grayscale augmentation

# High grayscale probability for robustness
aug_cfg = AugmentationCfg(
    scale=(0.8, 1.0),
    gray_scale_prob=0.5  # 50% chance of grayscale
)

Usage with image_transform_v2

import open_clip

preprocess_cfg = open_clip.PreprocessCfg(size=224)

# Create augmentation config
aug_cfg = open_clip.AugmentationCfg(
    scale=(0.08, 1.0),
    color_jitter=0.4,
    color_jitter_prob=0.8
)

# Create training transform
train_transform = open_clip.image_transform_v2(
    cfg=preprocess_cfg,
    is_train=True,
    aug_cfg=aug_cfg
)

# Use with dataset
from torchvision.datasets import ImageFolder
train_dataset = ImageFolder('data/train', transform=train_transform)

Augmentation Strategy Guide

Light Augmentation

  • scale: (0.9, 1.0)
  • color_jitter: 0.2
  • Best for: Fine-tuning, small datasets

Standard Augmentation

  • scale: (0.08, 1.0)
  • color_jitter: 0.4
  • Best for: Training from scratch

Strong Augmentation

  • use_timm: True
  • re_prob: 0.25
  • Best for: Large-scale training

Minimal Augmentation

  • scale: (0.95, 1.0)
  • No color jitter
  • Best for: High-quality datasets

Notes

  • Only used when is_train=True in image_transform_v2()
  • color_jitter_prob and gray_scale_prob are ignored when use_timm=True
  • Random erasing (re_prob, re_count) requires use_timm=True
  • Default values provide minimal augmentation; increase for stronger regularization
  • For contrastive learning, stronger augmentation typically improves performance

See Also

Build docs developers (and LLMs) love