AugmentationCfg

Overview

AugmentationCfg defines data augmentation parameters for training image transforms. It controls random augmentations like resized cropping, color jitter, and random erasing.

Class Definition

@dataclass
class AugmentationCfg:
    scale: Tuple[float, float] = (0.9, 1.0)
    ratio: Optional[Tuple[float, float]] = None
    color_jitter: Optional[Union[float, Tuple[float, float, float], Tuple[float, float, float, float]]] = None
    re_prob: Optional[float] = None
    re_count: Optional[int] = None
    use_timm: bool = False
    color_jitter_prob: float = None
    gray_scale_prob: float = None

Fields

scale

Tuple[float, float]

default:"(0.9, 1.0)"

Range of size of the random crop relative to the original image size. Used in RandomResizedCrop.

First value: minimum crop scale (e.g., 0.08 = crop can be 8% of original)
Second value: maximum crop scale (e.g., 1.0 = crop can be 100% of original)

Common values:

(0.08, 1.0): Standard ImageNet training
(0.9, 1.0): Light augmentation

ratio

Tuple[float, float]

default:"None"

Range of aspect ratio of the random crop. Used in RandomResizedCrop.

First value: minimum aspect ratio (e.g., 0.75 = 3:4)
Second value: maximum aspect ratio (e.g., 1.33 = 4:3)

If None, defaults to (3/4, 4/3) in torchvision.

color_jitter

Union[float, Tuple[float, ...]]

default:"None"

Color jitter augmentation strength. Can be specified as:

float: Applied to brightness, contrast, saturation (e.g., 0.4)
Tuple[float, float, float]: (brightness, contrast, saturation)
Tuple[float, float, float, float]: (brightness, contrast, saturation, hue)

Values are typically in range [0, 1]. Higher values = stronger augmentation.Example: (0.4, 0.4, 0.4, 0.1) = moderate jitter with slight hue variation

re_prob

float

default:"None"

Random erasing probability. Probability of applying random erasing augmentation.

0.0: No random erasing
0.25: 25% chance of erasing per image
1.0: Always apply random erasing

Requires use_timm=True.

re_count

int

default:"None"

Number of random erasing operations per image when random erasing is applied.Requires use_timm=True.

use_timm

bool

default:"False"

Whether to use timm (PyTorch Image Models) augmentation transforms.When True, enables advanced augmentations from timm:

RandAugment
Random erasing
More sophisticated augmentation pipelines

When False, uses simple torchvision-based augmentations.

color_jitter_prob

float

default:"None"

Probability of applying color jitter when use_timm=False.

0.0: Never apply color jitter
0.8: Apply color jitter 80% of the time (common default)
1.0: Always apply color jitter

Only used when use_timm=False.

gray_scale_prob

float

default:"None"

Probability of converting image to grayscale (with 3 channels) when use_timm=False.

0.0: Never grayscale
0.2: 20% chance of grayscale (common default)
1.0: Always grayscale

Only used when use_timm=False.

Examples

Standard ImageNet augmentation

from open_clip import AugmentationCfg, PreprocessCfg, image_transform_v2

# ImageNet-style training augmentation
aug_cfg = AugmentationCfg(
    scale=(0.08, 1.0),
    ratio=(0.75, 1.33),
    color_jitter=(0.4, 0.4, 0.4, 0.1),
    color_jitter_prob=0.8,
    gray_scale_prob=0.2
)

preprocess_cfg = PreprocessCfg(size=224)
train_transform = image_transform_v2(
    cfg=preprocess_cfg,
    is_train=True,
    aug_cfg=aug_cfg
)

Light augmentation

# Minimal augmentation for fine-tuning
aug_cfg = AugmentationCfg(
    scale=(0.9, 1.0),  # Only small crops
    color_jitter=0.2,   # Light color jitter
    color_jitter_prob=0.5
)

Strong augmentation with timm

# Advanced augmentation with timm
aug_cfg = AugmentationCfg(
    scale=(0.08, 1.0),
    color_jitter=0.4,
    re_prob=0.25,      # Random erasing
    re_count=1,
    use_timm=True      # Enable timm augmentations
)

No augmentation

# Training without augmentation (only random crop)
aug_cfg = AugmentationCfg(
    scale=(1.0, 1.0),  # No scale variation
    ratio=None,
    color_jitter=None
)

Custom aspect ratio range

# Allow more extreme aspect ratios
aug_cfg = AugmentationCfg(
    scale=(0.5, 1.0),
    ratio=(0.5, 2.0),  # From 1:2 to 2:1
    color_jitter=0.3
)

Grayscale augmentation

# High grayscale probability for robustness
aug_cfg = AugmentationCfg(
    scale=(0.8, 1.0),
    gray_scale_prob=0.5  # 50% chance of grayscale
)

Usage with image_transform_v2

import open_clip

preprocess_cfg = open_clip.PreprocessCfg(size=224)

# Create augmentation config
aug_cfg = open_clip.AugmentationCfg(
    scale=(0.08, 1.0),
    color_jitter=0.4,
    color_jitter_prob=0.8
)

# Create training transform
train_transform = open_clip.image_transform_v2(
    cfg=preprocess_cfg,
    is_train=True,
    aug_cfg=aug_cfg
)

# Use with dataset
from torchvision.datasets import ImageFolder
train_dataset = ImageFolder('data/train', transform=train_transform)

Augmentation Strategy Guide

Light Augmentation

scale: (0.9, 1.0)
color_jitter: 0.2
Best for: Fine-tuning, small datasets

Standard Augmentation

scale: (0.08, 1.0)
color_jitter: 0.4
Best for: Training from scratch

Strong Augmentation

use_timm: True
re_prob: 0.25
Best for: Large-scale training

Minimal Augmentation

scale: (0.95, 1.0)
No color jitter
Best for: High-quality datasets

Notes

Only used when is_train=True in image_transform_v2()
color_jitter_prob and gray_scale_prob are ignored when use_timm=True
Random erasing (re_prob, re_count) requires use_timm=True
Default values provide minimal augmentation; increase for stronger regularization
For contrastive learning, stronger augmentation typically improves performance

Model Creation

Pretrained Models

Tokenization

Transforms

Model Classes

Loss Functions

Zero-Shot

Overview

Class Definition

Fields

Examples

Standard ImageNet augmentation

Light augmentation

Strong augmentation with timm

No augmentation

Custom aspect ratio range

Grayscale augmentation

Usage with image_transform_v2

Augmentation Strategy Guide

Light Augmentation

Standard Augmentation

Strong Augmentation

Minimal Augmentation

Notes

See Also

Build docs developers (and LLMs) love

Model Creation

Pretrained Models

Tokenization

Transforms

Model Classes

Loss Functions

Zero-Shot

​Overview

​Class Definition

​Fields

​Examples

​Standard ImageNet augmentation

​Light augmentation

​Strong augmentation with timm

​No augmentation

​Custom aspect ratio range

​Grayscale augmentation

​Usage with image_transform_v2

​Augmentation Strategy Guide

Light Augmentation

Standard Augmentation

Strong Augmentation

Minimal Augmentation

​Notes

​See Also

Build docs developers (and LLMs) love

Overview

Class Definition

Fields

Examples

Standard ImageNet augmentation

Light augmentation

Strong augmentation with timm

No augmentation

Custom aspect ratio range

Grayscale augmentation

Usage with image_transform_v2

Augmentation Strategy Guide

Notes

See Also