Dataset

The project trains on a curated subset of YouTube-8M, containing approximately 4,000 videos across 4 main categories and 46 subcategories.

Animation

Cartoon, Naruto, Dragon Ball, One Piece, Bleach, Lego minifigure, Sonic the Hedgehog, The Walt Disney Company

Flat Content

Website, Chart, Map, Logo, Text, Typography, Screencast, Illustration, Poster

Gaming

Minecraft, Call of Duty, Grand Theft Auto V, World of Warcraft, League of Legends, Battlefield, FIFA 15, RuneScape

Natural Content

Animal, Pet, Fishing, Dog, Horse, Bird, Plant, Cat, Farm, Garden, Nature, Tree, Wildlife, Chicken

Category Mapping

Classes are encoded as integer labels for training:

{
  "Animation": 0,
  "Flat_Content": 1,
  "Gaming": 2,
  "Natural_Content": 3
}

Dataset Split

Split	Proportion	Purpose
Train	70%	Model training with augmentation
Validation	20%	Hyperparameter tuning and early stopping
Test	10%	Final evaluation and TTA

Directory Structure

After preprocessing, the data directory follows this layout:

data/processed/
├── train/
│   ├── Animation/
│   │   ├── Cartoon/
│   │   │   └── processed_data.pt
│   │   └── Naruto/
│   │       └── processed_data.pt
│   ├── Flat_Content/
│   ├── Gaming/
│   └── Natural_Content/
├── val/
│   ├── Animation/
│   ├── Flat_Content/
│   ├── Gaming/
│   └── Natural_Content/
└── test/
    ├── Animation/
    ├── Flat_Content/
    ├── Gaming/
    └── Natural_Content/

Each processed_data.pt file is a PyTorch serialized dictionary containing the preprocessed video tensor stack, integer labels, filenames, and a per-subcategory category mapping.

Class Imbalance Handling

The dataset has unequal sample counts across categories. The training dataloader uses WeightedRandomSampler to ensure each class is seen proportionally during training. Class weights are computed per-video in EnhancedPreExtractedFeaturesDataset._compute_class_weights() and clipped to the range [0.5, 10.0] to avoid extreme oversampling:

def _compute_class_weights(self):
    num_classes = len(self.class_counts)
    total_samples = len(self.labels)

    weights = []
    for class_id in range(num_classes):
        if class_id in self.class_counts:
            weight = total_samples / (num_classes * self.class_counts[class_id])
            weight = min(max(weight, 0.5), 10.0)
        else:
            weight = 1.0
        weights.append(weight)

    return torch.FloatTensor(weights)

The sampler is then constructed using per-sample weights derived from the class weights:

sample_weights = train_dataset.get_sample_weights()
sampler = WeightedRandomSampler(
    weights=sample_weights,
    num_samples=len(train_dataset),
    replacement=True
)

The FocalLoss used during training also accepts per-class alpha weights, providing a second layer of imbalance correction at the loss level. See Optimization for details.

Get Started

Concepts

Training Guide

Inference & Deployment

Model Cards

Evaluation

Categories

Animation

Flat Content

Gaming

Natural Content

Category Mapping

Dataset Split

Directory Structure

Class Imbalance Handling

Build docs developers (and LLMs) love

Get Started

Concepts

Training Guide

Inference & Deployment

Model Cards

Evaluation

​Categories

Animation

Flat Content

Gaming

Natural Content

​Category Mapping

​Dataset Split

​Directory Structure

​Class Imbalance Handling

Build docs developers (and LLMs) love

Categories

Category Mapping

Dataset Split

Directory Structure

Class Imbalance Handling