Skip to main content
The project trains on a curated subset of YouTube-8M, containing approximately 4,000 videos across 4 main categories and 46 subcategories.

Categories

Animation

Cartoon, Naruto, Dragon Ball, One Piece, Bleach, Lego minifigure, Sonic the Hedgehog, The Walt Disney Company

Flat Content

Website, Chart, Map, Logo, Text, Typography, Screencast, Illustration, Poster

Gaming

Minecraft, Call of Duty, Grand Theft Auto V, World of Warcraft, League of Legends, Battlefield, FIFA 15, RuneScape

Natural Content

Animal, Pet, Fishing, Dog, Horse, Bird, Plant, Cat, Farm, Garden, Nature, Tree, Wildlife, Chicken

Category Mapping

Classes are encoded as integer labels for training:
{
  "Animation": 0,
  "Flat_Content": 1,
  "Gaming": 2,
  "Natural_Content": 3
}

Dataset Split

SplitProportionPurpose
Train70%Model training with augmentation
Validation20%Hyperparameter tuning and early stopping
Test10%Final evaluation and TTA

Directory Structure

After preprocessing, the data directory follows this layout:
data/processed/
├── train/
│   ├── Animation/
│   │   ├── Cartoon/
│   │   │   └── processed_data.pt
│   │   └── Naruto/
│   │       └── processed_data.pt
│   ├── Flat_Content/
│   ├── Gaming/
│   └── Natural_Content/
├── val/
│   ├── Animation/
│   ├── Flat_Content/
│   ├── Gaming/
│   └── Natural_Content/
└── test/
    ├── Animation/
    ├── Flat_Content/
    ├── Gaming/
    └── Natural_Content/
Each processed_data.pt file is a PyTorch serialized dictionary containing the preprocessed video tensor stack, integer labels, filenames, and a per-subcategory category mapping.

Class Imbalance Handling

The dataset has unequal sample counts across categories. The training dataloader uses WeightedRandomSampler to ensure each class is seen proportionally during training. Class weights are computed per-video in EnhancedPreExtractedFeaturesDataset._compute_class_weights() and clipped to the range [0.5, 10.0] to avoid extreme oversampling:
def _compute_class_weights(self):
    num_classes = len(self.class_counts)
    total_samples = len(self.labels)

    weights = []
    for class_id in range(num_classes):
        if class_id in self.class_counts:
            weight = total_samples / (num_classes * self.class_counts[class_id])
            weight = min(max(weight, 0.5), 10.0)
        else:
            weight = 1.0
        weights.append(weight)

    return torch.FloatTensor(weights)
The sampler is then constructed using per-sample weights derived from the class weights:
sample_weights = train_dataset.get_sample_weights()
sampler = WeightedRandomSampler(
    weights=sample_weights,
    num_samples=len(train_dataset),
    replacement=True
)
The FocalLoss used during training also accepts per-class alpha weights, providing a second layer of imbalance correction at the loss level. See Optimization for details.

Build docs developers (and LLMs) love