Skip to main content
Loads weights from a checkpoint file into an existing CLIP model. Handles various checkpoint formats, state dict conversions, and position embedding resizing.

Signature

def load_checkpoint(
    model: Union[CLIP, CustomTextCLIP],
    checkpoint_path: str,
    strict: bool = True,
    weights_only: bool = True,
    device='cpu',
):
    ...

Parameters

model
Union[CLIP, CustomTextCLIP]
required
The model instance to load weights into. Must be a CLIP, CustomTextCLIP, or CoCa model.
checkpoint_path
str
required
Path to the checkpoint file. Supported formats:
  • PyTorch checkpoint (.pt, .pth, .bin)
  • SafeTensors (.safetensors)
  • NumPy/big_vision format (.npz, .npy) for SigLIP weights
strict
bool
default:"True"
If True, enforces that the keys in the checkpoint exactly match the model’s state dict. Set to False to allow partial weight loading or when loading weights with different key names.
weights_only
bool
default:"True"
Use weights_only=True for torch.load (safer, prevents arbitrary code execution). Only applies to PyTorch checkpoint formats.
device
str
default:"'cpu'"
Device to load checkpoint tensors onto initially. Usually 'cpu' to avoid OOM issues during loading.

Returns

incompatible_keys
Dict
Dictionary containing information about incompatible keys:
  • missing_keys: List of keys in the model but not in checkpoint
  • unexpected_keys: List of keys in checkpoint but not in model
Empty dict {} if loading from NumPy/big_vision format.

Example

import open_clip
import torch

# Create a model
model = open_clip.create_model('ViT-B-32', load_weights=False)

# Load checkpoint
incompatible = open_clip.load_checkpoint(
    model,
    checkpoint_path='path/to/checkpoint.pt',
    strict=True
)
print(f"Missing keys: {incompatible.missing_keys}")
print(f"Unexpected keys: {incompatible.unexpected_keys}")

# Load with non-strict mode (useful for partial loading)
incompatible = open_clip.load_checkpoint(
    model,
    checkpoint_path='path/to/image_encoder.pt',
    strict=False
)

# Load SafeTensors checkpoint
incompatible = open_clip.load_checkpoint(
    model,
    checkpoint_path='path/to/model.safetensors'
)

# Load SigLIP NumPy weights
model = open_clip.create_model('ViT-B-16-SigLIP', load_weights=False)
open_clip.load_checkpoint(
    model,
    checkpoint_path='path/to/siglip_weights.npz'
)

Checkpoint Format Handling

The function automatically handles various checkpoint formats:

PyTorch Checkpoints

# Supports both full checkpoints with 'state_dict' key
checkpoint = {
    'state_dict': {...},
    'epoch': 10,
    'optimizer': {...}
}

# And raw state dicts
checkpoint = {...}  # Direct state dict

Module Prefix Removal

Automatically removes 'module.' prefix from keys (common in distributed training):
# Checkpoint has: 'module.visual.conv1.weight'
# Loads as: 'visual.conv1.weight'

Position Embedding Resizing

Automatically resizes position embeddings if model and checkpoint have different sizes:
# Load 224px weights into 336px model
model = open_clip.create_model('ViT-L-14', force_image_size=336, load_weights=False)
open_clip.load_checkpoint(model, 'vit_l_14_224px.pt')
# Position embeddings automatically interpolated to 336px

State Dict Conversion

Automatically converts state dicts from various sources:
  • OpenAI CLIP format
  • Hugging Face transformers format
  • timm format
  • OpenCLIP legacy format

Notes

  • For SafeTensors format, the safetensors package must be installed: pip install safetensors
  • For NumPy/big_vision format (SigLIP), weights are loaded directly without returning incompatible keys
  • The function handles mismatches in logit_scale and logit_bias tensor shapes automatically
  • Position embeddings for both image and text are automatically resized if needed

See Also

Build docs developers (and LLMs) love