Skip to main content

TaobaoVDDataset

Dataset class for TaobaoVD-GC video quality assessment dataset. Handles video loading, frame sampling, and preprocessing for both DOVER++ (640x640) and V-JEPA2 (384x384) models.
from qualivision.utils.dataset import TaobaoVDDataset

dataset = TaobaoVDDataset(
    csv_file="train.csv",
    video_dir="/path/to/videos",
    num_frames=64,
    resolution=640,
    mode='train'
)

Parameters

csv_file
str
required
Path to CSV file with labels
video_dir
str
required
Directory containing video files
num_frames
int
default:"64"
Number of frames to sample from each video
resolution
int
default:"640"
Target resolution for videos (640 for DOVER++, 384 for V-JEPA2)
mode
str
default:"'train'"
Dataset mode: ‘train’, ‘val’, or ‘test’
video_processor
AutoVideoProcessor
default:"None"
Optional video processor for V-JEPA2 model

Attributes

MOS_COLS
List[str]
Column names for MOS scores: [‘Traditional_MOS’, ‘Alignment_MOS’, ‘Aesthetic_MOS’, ‘Temporal_MOS’, ‘Overall_MOS’]
has_labels
bool
Whether the dataset contains ground truth labels

Methods

__len__()

Returns the number of samples in the dataset. Returns: int - Number of samples

__getitem__(idx)

Get a single sample from the dataset. Parameters:
  • idx (int): Sample index
Returns: Dict[str, Any] containing:
  • frames (torch.Tensor): Video frames with shape (C, T, H, W)
  • prompt (str): Text prompt for the video
  • video_name (str): Name of the video file
  • labels (torch.Tensor): MOS labels (5,) or zeros for test mode

OptimizedGPUCollate

Optimized collate function for GPU processing with text encoding. Handles batching of video data and text encoding, optimizing for GPU memory usage and processing speed.
from qualivision.utils.dataset import OptimizedGPUCollate

collate_fn = OptimizedGPUCollate(
    text_encoder=text_encoder,
    device='cuda',
    max_frames=64
)

Parameters

video_processor
AutoVideoProcessor
default:"None"
Optional video processor for V-JEPA2
text_encoder
SentenceTransformer
default:"None"
Text encoder for prompt processing
device
str
default:"'cuda'"
Device to place tensors on
max_frames
int
default:"64"
Maximum number of frames per video

Methods

__call__(batch)

Collate a batch of samples. Parameters:
  • batch (List[Dict[str, Any]]): List of samples from dataset
Returns: Dict[str, torch.Tensor] containing:
  • pixel_values_videos (torch.Tensor): Batched video frames (B, C, T, H, W)
  • text_emb (torch.Tensor): Text embeddings (B, D)
  • labels (torch.Tensor): Batched labels (B, 5)
  • video_names (List[str]): List of video names
  • prompts (List[str]): List of original prompts

create_data_loaders

Create train and validation data loaders.
from qualivision.utils.dataset import create_data_loaders

train_loader, val_loader = create_data_loaders(
    train_csv="train.csv",
    val_csv="val.csv",
    train_video_dir="/path/to/train",
    val_video_dir="/path/to/val",
    batch_size=4,
    num_frames=64,
    text_encoder=text_encoder
)

Parameters

train_csv
str
required
Path to training CSV file
val_csv
str
required
Path to validation CSV file
train_video_dir
str
required
Directory containing training videos
val_video_dir
str
required
Directory containing validation videos
batch_size
int
default:"4"
Batch size for data loading
num_frames
int
default:"64"
Number of frames per video
resolution
int
default:"640"
Target resolution for videos
video_processor
AutoVideoProcessor
default:"None"
Optional video processor for V-JEPA2
text_encoder
SentenceTransformer
default:"None"
Optional text encoder for prompt processing
device
str
default:"'cuda'"
Device for processing
num_workers
int
default:"4"
Number of worker processes for data loading

Returns

train_loader
DataLoader
Training data loader
val_loader
DataLoader
Validation data loader

create_test_loader

Create test data loader.
from qualivision.utils.dataset import create_test_loader

test_loader = create_test_loader(
    test_csv="test.csv",
    test_video_dir="/path/to/test",
    batch_size=1,
    text_encoder=text_encoder
)

Parameters

test_csv
str
required
Path to test CSV file
test_video_dir
str
required
Directory containing test videos
batch_size
int
default:"1"
Batch size for data loading
num_frames
int
default:"64"
Number of frames per video
resolution
int
default:"640"
Target resolution for videos
video_processor
AutoVideoProcessor
default:"None"
Optional video processor for V-JEPA2
text_encoder
SentenceTransformer
default:"None"
Optional text encoder for prompt processing
device
str
default:"'cuda'"
Device for processing

Returns

test_loader
DataLoader
Test data loader with num_workers=0

Build docs developers (and LLMs) love