Signature
Parameters
Model identifier, potentially with schema prefix:
'ViT-B-32': Built-in model name.pretrainedspecifies CLIP weights source (tag or file path).'hf-hub:org/repo': Loads config/weights from HuggingFace Hub.pretrainedis IGNORED.'local-dir:/path/to/folder': Loads config/weights from local directory.pretrainedis IGNORED.
Source for CLIP weights (tag or file path) ONLY if
model_name has no schema. Can be a pretrained tag like 'openai', 'laion400m_e32', or a path to a checkpoint file.Load the resolved pretrained weights if True, otherwise random init or tower overrides only.
Model precision. Options:
'fp32', 'fp16', 'bf16', 'pure_fp16', 'pure_bf16'.Device to load model on. Can be
'cpu', 'cuda', or a torch.device object.If True, JIT compile the model using torch.jit.script.
Force use of QuickGELU activation in model config instead of standard GELU.
Force use of custom text encoder architecture (CustomTextCLIP).
Override patch dropout value in model config. Values typically range from 0.0 to 1.0.
Override image size in model config. Can be a single int (square) or tuple (height, width).
Dictionary to override specific preprocessing parameters (mean, std, interpolation, resize_mode).
Override context length (max sequence length) in text config.
Load default base weights for image tower at creation if no CLIP weights loaded. Only effective for timm-based vision models.
Load default base weights for text tower at creation if no CLIP weights loaded. Only effective for HuggingFace-based text models.
Path to load weights specifically into image tower after model creation. Loads after full CLIP checkpoint.
Path to load weights specifically into text tower after model creation. Loads after full CLIP checkpoint.
Cache directory for downloaded weights. Defaults to
~/.cache/clip.If True and model supports it, return dictionary output instead of tensors.
Raise error if no pretrained CLIP weights loaded when required.
Use weights_only=True for torch.load (safer, prevents arbitrary code execution).
Additional keyword arguments for model constructor (highest override priority).
Returns
The created model instance (CLIP, CustomTextCLIP, or CoCa depending on configuration).
