Zero-Shot Evaluation
OpenCLIP supports zero-shot evaluation, where models are tested on classification tasks without any task-specific fine-tuning. This is one of CLIP’s key capabilities.Zero-Shot Evaluation During Training
You can run zero-shot ImageNet evaluation automatically during training using the--zeroshot-frequency flag.
Setup
To enable zero-shot evaluation during training, you need:- ImageNet validation set: Path to the validation split (not training set)
- Zeroshot frequency: How often to run evaluation (in epochs)
Command Example
The
--imagenet-val path should point to the validation set of ImageNet, not the training set. The validation folder should contain subfolders for each class. If it doesn’t, use this script to organize it.Parameters
--zeroshot-frequency N: Run zero-shot evaluation every N epochs. Set to 0 to disable.--imagenet-val PATH: Path to ImageNet validation set for zero-shot evaluation
How Zero-Shot Evaluation Works
The zero-shot evaluation process in OpenCLIP follows these steps:1. Building the Zero-Shot Classifier
The classifier is built by encoding text prompts for all ImageNet classes:{class}”, “a picture of a {class}”, etc.) to create multiple text descriptions for each class.
2. Computing Image Features
For each image in the validation set:3. Computing Accuracy
Accuracy is computed using top-1 and top-5 metrics:Evaluating Pre-trained Checkpoints
Local Checkpoint
Evaluate a local checkpoint on ImageNet:Hosted Checkpoint
Evaluate a pre-trained model from the model zoo:Hugging Face Checkpoint
You can also evaluate checkpoints from Hugging Face:Systematic Evaluation with CLIP_benchmark
For comprehensive evaluation across multiple datasets, we recommend using CLIP_benchmark:- 40+ datasets for zero-shot classification
- Retrieval tasks (image-to-text and text-to-image)
- Multiple languages for multilingual models
- Standardized metrics for fair comparison
Evaluation Metrics
During zero-shot evaluation, the following metrics are logged:| Metric | Description |
|---|---|
imagenet-zeroshot-val-top1 | Top-1 accuracy on ImageNet validation set |
imagenet-zeroshot-val-top5 | Top-5 accuracy on ImageNet validation set |
imagenetv2-zeroshot-val-top1 | Top-1 accuracy on ImageNet-V2 (if available) |
imagenetv2-zeroshot-val-top5 | Top-5 accuracy on ImageNet-V2 (if available) |
Example Results During Training
When training with zero-shot evaluation enabled, you’ll see output like:Best Practices
Implementation Details
Source Code
The zero-shot evaluation implementation is insrc/open_clip_train/zero_shot.py. Key functions:
zero_shot_eval(): Main evaluation function called during trainingrun(): Runs inference on the validation setaccuracy(): Computes top-k accuracy
Supported Datasets
By default, zero-shot evaluation supports:- ImageNet-1k (ILSVRC2012 validation set)
- ImageNet-V2 (matched frequency variant)
Next Steps
- Learn about benchmark results across 38 datasets
- Understand evaluation metrics in detail
- Explore pre-trained models on Hugging Face
