Get Started in 5 Minutes
This guide will help you run your first video quality assessment using QualiVision’s pre-trained models.
Prerequisites: Make sure you have Python 3.8+ and CUDA installed (optional but recommended for GPU acceleration).
Setup
Clone the Repository
git clone https://github.com/RITIK-12/QualiVision.git
cd QualiVision
Install Dependencies
Install all required packages using pip:pip install -r requirements.txt
This will install PyTorch, transformers, and other necessary libraries. See the installation guide for detailed setup options. Prepare Your Data
Organize your test videos in the following structure:data/
└── test/
├── test_labels.csv
└── videos/
├── video001.mp4
├── video002.mp4
└── ...
Your test_labels.csv should include:video_name,Prompt,Traditional_MOS,Alignment_MOS,Aesthetic_MOS,Temporal_MOS,Overall_MOS
video001.mp4,"A cat playing piano",3.2,4.1,3.8,3.5,3.65
Run Your First Evaluation
DOVER++ Model
V-JEPA2 Model
Evaluate videos using the DOVER++ model with ConvNeXt 3D architecture:python scripts/evaluate.py \
--model dover \
--checkpoint models/dover_best.pt \
--data path/to/test/data
Model weights will be automatically downloaded on first use if not present locally.
Evaluate videos using the V-JEPA2 model with Vision Transformer architecture:python scripts/evaluate.py \
--model vjepa \
--checkpoint models/vjepa_best.pt \
--data path/to/test/data
V-JEPA2 requires more memory (~16GB) but provides strong video representation learning.
Understanding the Output
After evaluation completes, QualiVision generates comprehensive results:
Predictions CSV
Contains quality scores for each video across all dimensions:video_name,Traditional_MOS,Alignment_MOS,Aesthetic_MOS,Temporal_MOS,Overall_MOS
video001.mp4,3.15,4.08,3.82,3.47,3.63
video002.mp4,4.52,4.19,4.76,4.13,4.40
Metrics Report
If ground truth labels are provided, you’ll see performance metrics:Evaluation Results:
------------------
SROCC: 0.8542
PLCC: 0.8731
VQualA Score: 0.8637
Summary Report
A detailed text report with model configuration and prediction statistics:QualiVision Model Evaluation Report
===================================
Model: DOVER
Checkpoint: models/dover_best.pt
Samples: 150
Prediction Statistics:
---------------------
Min: 2.1234
Max: 4.8765
Mean: 3.6542
Std: 0.7234
Example: Complete Evaluation Workflow
Here’s a complete example showing how to evaluate a test dataset:
# Evaluate DOVER++ model with custom output directory
python scripts/evaluate.py \
--model dover \
--checkpoint models/dover_best.pt \
--data data/test \
--output results/dover \
--batch-size 4
QualiVision Model Evaluation
============================
Model: DOVER
Checkpoint: models/dover_best.pt
Test CSV: data/test/test_labels.csv
Test videos: data/test/videos
Output: results/dover
Device: cuda
Initializing DOVER Model Evaluator
Checkpoint: models/dover_best.pt
Device: cuda
Loading DOVER checkpoint: models/dover_best.pt
✓ DOVER checkpoint loaded
✓ Model loaded successfully
GPU Memory: 2.3 GB / 24.0 GB
Evaluating on test dataset:
CSV: data/test/test_labels.csv
Videos: data/test/videos
Batch size: 4
Generating predictions...
Predicting: 100%|████████████████| 38/38 [02:15<00:00, 3.56s/it]
✓ Generated predictions for 150 samples
✓ Ground truth labels found, computing metrics
Evaluation Results:
------------------
SROCC: 0.8542
PLCC: 0.8731
RMSE: 0.3421
VQualA Score: 0.8637
✓ Predictions saved:
CSV: results/dover/predictions_DOVER_20250304_143022.csv
Excel: results/dover/predictions_DOVER_20250304_143022.xlsx
✓ Results saved: results/dover/results_DOVER_20250304_143022.json
✓ Summary report saved: results/dover/report_DOVER_20250304_143022.txt
✓ Evaluation completed successfully!
Final VQualA Score: 0.8637
Code Example: Using the Evaluator Class
You can also use QualiVision programmatically in your Python code:
import sys
from pathlib import Path
# Add src to path
sys.path.insert(0, str(Path(__file__).parent / "src"))
from src.models.dover_model import DOVERModel
import torch
# Initialize model
model = DOVERModel(
dover_weights_path="models/DOVER_plus_plus.pth",
text_encoder_name="BAAI/bge-large-en-v1.5",
device="cuda"
)
# Load checkpoint
checkpoint = torch.load("models/dover_best.pt")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Run inference
with torch.no_grad():
outputs = model(video_tensor, text_prompts)
quality_scores = outputs.cpu().numpy()
Full Evaluation Script from Source
Here’s the core evaluation logic from scripts/evaluate.py:def _predict_on_loader(self, test_loader) -> Dict[str, Any]:
"""Run predictions on data loader."""
predictions = []
targets = []
video_names = []
print("\nGenerating predictions...")
with torch.no_grad():
for i, batch in enumerate(tqdm(test_loader, desc="Predicting")):
try:
# Forward pass
if self.model_type == 'dover':
outputs = self.model(batch['pixel_values_videos'], batch['prompts'])
else: # vjepa
outputs = self.model(batch['pixel_values_videos'], batch['text_emb'])
# Extract predictions
batch_predictions = outputs.cpu().numpy()
batch_targets = batch['labels'].cpu().numpy()
predictions.append(batch_predictions)
targets.append(batch_targets)
video_names.extend(batch['video_names'])
# Memory cleanup
del outputs
if i % 10 == 0:
ultra_memory_cleanup()
except Exception as e:
print(f"⚠ Error processing batch {i}: {e}")
# Concatenate all predictions
predictions = np.concatenate(predictions, axis=0)
targets = np.concatenate(targets, axis=0)
return {
'predictions': predictions,
'targets': targets,
'video_names': video_names
}
From: scripts/evaluate.py:188-239
Advanced Options
Adjust batch size based on your GPU memory:# Smaller batch for limited memory
python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
--data data/test --batch-size 2
# Larger batch for more memory
python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
--data data/test --batch-size 8
If you don’t have a GPU, you can run on CPU (slower):python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
--data data/test --device cpu
Custom CSV/Video Directory Names
If your data uses different file names:python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
--data data/test \
--csv-name my_labels.csv \
--video-dir my_videos
Next Steps
Train Custom Models
Fine-tune QualiVision on your own dataset
Model Architecture
Deep dive into DOVER++ and V-JEPA2 architectures
API Reference
Explore the complete API documentation
Memory Optimization
Optimize memory usage for your hardware
Memory Requirements: DOVER++ requires ~12GB GPU memory, V-JEPA2 requires ~16GB. For limited memory, reduce batch size or use gradient accumulation.