Dataset Utilities

TaobaoVDDataset

Dataset class for TaobaoVD-GC video quality assessment dataset. Handles video loading, frame sampling, and preprocessing for both DOVER++ (640x640) and V-JEPA2 (384x384) models.

from qualivision.utils.dataset import TaobaoVDDataset

dataset = TaobaoVDDataset(
    csv_file="train.csv",
    video_dir="/path/to/videos",
    num_frames=64,
    resolution=640,
    mode='train'
)

Parameters

csv_file

str

required

Path to CSV file with labels

video_dir

str

required

Directory containing video files

num_frames

int

default:"64"

Number of frames to sample from each video

resolution

int

default:"640"

Target resolution for videos (640 for DOVER++, 384 for V-JEPA2)

mode

str

default:"'train'"

Dataset mode: ‘train’, ‘val’, or ‘test’

video_processor

AutoVideoProcessor

default:"None"

Optional video processor for V-JEPA2 model

Attributes

MOS_COLS

List[str]

Column names for MOS scores: [‘Traditional_MOS’, ‘Alignment_MOS’, ‘Aesthetic_MOS’, ‘Temporal_MOS’, ‘Overall_MOS’]

has_labels

bool

Whether the dataset contains ground truth labels

Methods

len()

Returns the number of samples in the dataset. Returns: int - Number of samples

getitem(idx)

Get a single sample from the dataset. Parameters:

idx (int): Sample index

Returns: Dict[str, Any] containing:

frames (torch.Tensor): Video frames with shape (C, T, H, W)
prompt (str): Text prompt for the video
video_name (str): Name of the video file
labels (torch.Tensor): MOS labels (5,) or zeros for test mode

OptimizedGPUCollate

Optimized collate function for GPU processing with text encoding. Handles batching of video data and text encoding, optimizing for GPU memory usage and processing speed.

from qualivision.utils.dataset import OptimizedGPUCollate

collate_fn = OptimizedGPUCollate(
    text_encoder=text_encoder,
    device='cuda',
    max_frames=64
)

Parameters

video_processor

AutoVideoProcessor

default:"None"

Optional video processor for V-JEPA2

text_encoder

SentenceTransformer

default:"None"

Text encoder for prompt processing

device

str

default:"'cuda'"

Device to place tensors on

max_frames

int

default:"64"

Maximum number of frames per video

Methods

call(batch)

Collate a batch of samples. Parameters:

batch (List[Dict[str, Any]]): List of samples from dataset

Returns: Dict[str, torch.Tensor] containing:

pixel_values_videos (torch.Tensor): Batched video frames (B, C, T, H, W)
text_emb (torch.Tensor): Text embeddings (B, D)
labels (torch.Tensor): Batched labels (B, 5)
video_names (List[str]): List of video names
prompts (List[str]): List of original prompts

create_data_loaders

Create train and validation data loaders.

from qualivision.utils.dataset import create_data_loaders

train_loader, val_loader = create_data_loaders(
    train_csv="train.csv",
    val_csv="val.csv",
    train_video_dir="/path/to/train",
    val_video_dir="/path/to/val",
    batch_size=4,
    num_frames=64,
    text_encoder=text_encoder
)

Parameters

train_csv

str

required

Path to training CSV file

val_csv

str

required

Path to validation CSV file

train_video_dir

str

required

Directory containing training videos

val_video_dir

str

required

Directory containing validation videos

batch_size

int

default:"4"

Batch size for data loading

num_frames

int

default:"64"

Number of frames per video

resolution

int

default:"640"

Target resolution for videos

video_processor

AutoVideoProcessor

default:"None"

Optional video processor for V-JEPA2

text_encoder

SentenceTransformer

default:"None"

Optional text encoder for prompt processing

device

str

default:"'cuda'"

Device for processing

num_workers

int

default:"4"

Number of worker processes for data loading

Returns

train_loader

DataLoader

Training data loader

val_loader

DataLoader

Validation data loader

create_test_loader

Create test data loader.

from qualivision.utils.dataset import create_test_loader

test_loader = create_test_loader(
    test_csv="test.csv",
    test_video_dir="/path/to/test",
    batch_size=1,
    text_encoder=text_encoder
)

Parameters

test_csv

str

required

Path to test CSV file

test_video_dir

str

required

Directory containing test videos

batch_size

int

default:"1"

Batch size for data loading

num_frames

int

default:"64"

Number of frames per video

resolution

int

default:"640"

Target resolution for videos

video_processor

AutoVideoProcessor

default:"None"

Optional video processor for V-JEPA2

text_encoder

SentenceTransformer

default:"None"

Optional text encoder for prompt processing

device

str

default:"'cuda'"

Device for processing

Returns

test_loader

DataLoader

Test data loader with num_workers=0

Models

Utilities

Configuration

TaobaoVDDataset

Parameters

Attributes

Methods

len()

getitem(idx)

OptimizedGPUCollate

Parameters

Methods

call(batch)

create_data_loaders

Parameters

Returns

create_test_loader

Parameters

Returns

Build docs developers (and LLMs) love

Models

Utilities

Configuration

​TaobaoVDDataset

​Parameters

​Attributes

​Methods

​__len__()

​__getitem__(idx)

​OptimizedGPUCollate

​Parameters

​Methods

​__call__(batch)

​create_data_loaders

​Parameters

​Returns

​create_test_loader

​Parameters

​Returns

Build docs developers (and LLMs) love

TaobaoVDDataset

Parameters

Attributes

Methods

len()

getitem(idx)

OptimizedGPUCollate

Parameters

Methods

call(batch)

create_data_loaders

Parameters

Returns

create_test_loader

Parameters

Returns