Overview
PreprocessCfg is a dataclass that encapsulates all preprocessing parameters for CLIP image transforms. It provides type-safe configuration for image resizing, normalization, and color mode settings.
Class Definition
Fields
Target size for preprocessed images.
- int: Square images (e.g.,
224→ 224×224) - Tuple[int, int]: Rectangular images
(height, width)(e.g.,(384, 224))
224: ViT-B, ViT-L (base resolution)336: ViT-L (high resolution)384: ViT-L/14@336px
Color mode for image conversion. Currently only
'RGB' is supported.Images are automatically converted to RGB with 3 channels.Mean values for normalization, one per channel (R, G, B).Default values are from the OPENAI CLIP dataset:
- R: 0.48145466
- G: 0.4578275
- B: 0.40821073
normalized = (image - mean) / stdStandard deviation values for normalization, one per channel (R, G, B).Default values are from the OPENAI CLIP dataset:
- R: 0.26862954
- G: 0.26130258
- B: 0.27577711
normalized = (image - mean) / stdInterpolation method for resizing images.Options:
'bicubic': High-quality interpolation (recommended, default for CLIP)'bilinear': Faster but lower quality'random': Randomly choose between bicubic/bilinear (training only)
Strategy for resizing images to target size.Options:
'shortest': Resize shortest edge to target size, then center crop'longest': Resize longest edge to target size, then pad and center crop'squash': Resize to exact target size (may distort aspect ratio)
Fill color value (0-255) for padding when using
resize_mode='longest'.0: Black padding (default)255: White padding- Other values: Gray shades
Properties
num_channels
input_size
(channels, height, width).
Examples
Create default config
Custom image size
Custom normalization
Different resize modes
High-resolution config
Use with image_transform_v2
Extract from model
Merge configs
Resize Mode Comparison
| Mode | Behavior | Use Case |
|---|---|---|
shortest | Resize shortest edge, crop center | Default, preserves aspect ratio |
longest | Resize longest edge, pad to square | When you want to see entire image |
squash | Resize to exact dimensions | May distort, fastest |
Common Configurations
ViT-B/32
ViT-L/14
ViT-L/14@336
ViT-H/14
Notes
- Always use
PreprocessCfginstead of passing individual parameters to transforms - The config object is returned by
create_model_and_transforms() - Mean and std values should match the model’s training data statistics
- Use
bicubicinterpolation for best quality (matches CLIP training) resize_mode='shortest'is the standard CLIP preprocessing approach
See Also
AugmentationCfg- Augmentation configuration for trainingimage_transform_v2()- Create transforms from configcreate_model_and_transforms()- Get model with config
