Overview
Creates torchvision transform pipelines for preprocessing images before passing them to CLIP vision encoders. Accepts aPreprocessCfg configuration object for clean, declarative transform configuration.
Function Signature
Parameters
Preprocessing configuration object containing:
size: Target image size (int or tuple)mean: Normalization mean valuesstd: Normalization std valuesinterpolation: Resize interpolation methodresize_mode: How to resize images (‘shortest’, ‘longest’, ‘squash’)fill_color: Padding fill color
PreprocessCfg for details.Whether to create training (with augmentation) or inference (deterministic) transforms.
Augmentation configuration for training. Only used when
is_train=True.See AugmentationCfg for options.Returns
Composed transform pipeline that can be applied to PIL Images. Includes:
- Training: Random crops, color jitter, normalization
- Inference: Resize, center crop, normalization
Examples
Create inference transform
Create training transform with augmentation
Use with model preprocess config
Different resize modes
Training with timm augmentations
Non-square images
Transform Pipeline
Inference Mode (is_train=False)
- Resize based on
resize_mode:shortest: Resize shortest edge to target sizelongest: Resize longest edge to target sizesquash: Resize to exact dimensions
- Center Crop (or pad if needed)
- Convert to RGB
- To Tensor
- Normalize with mean/std
Training Mode (is_train=True)
- Random Resized Crop with scale and ratio
- Convert to RGB
- Color Jitter (optional, based on aug_cfg)
- Grayscale (optional, based on aug_cfg)
- To Tensor
- Normalize with mean/std
- Random Erasing (optional, if using timm)
Notes
- For most use cases, use the transforms returned by
create_model_and_transforms() PreprocessCfgprovides type-safe configuration compared to passing individual parameters- Training transforms include random augmentation for better generalization
- Inference transforms are deterministic and optimized for consistent preprocessing
- Images are automatically converted to RGB mode
- Normalization uses ImageNet statistics by default
See Also
PreprocessCfg- Preprocessing configuration dataclassAugmentationCfg- Augmentation configuration dataclasscreate_model_and_transforms()- Get model with transforms
