Running inference

The project provides two ways to run inference on pre-classified video content. Both paths use the same underlying ensemble of four trained checkpoints and the same pre-extracted feature pipeline.

Flask web app

API-based inference with a browser UI. Start the server and classify videos through HTTP endpoints.

Command-line script

Interactive script for classifying individual videos or running batch accuracy tests from the terminal.

Prerequisites

Before running inference, ensure the following are available. Python packages

pip install torch torchvision flask h5py numpy opencv-python requests yt-dlp

Required files

File	Purpose
`features_enhanced/test_features_multiscale.h5`	Pre-extracted test features (412 videos, shape `[412, 73, 1280]`)
`models_enhanced/best_ensemble_model_{1-4}.pt`	Trained model checkpoints
`data/processed/test/{category}/{subcategory}/processed_data.pt`	Video name index used to map filenames to h5 positions

All four checkpoint files must be present. The classifier will skip any missing checkpoints and log a warning, but inference quality degrades with fewer ensemble members.

Directory layout

Place files relative to app.py (or test_already_extracted.py):

Flask Local New/
├── app.py
├── test_already_extracted.py
├── model_train_new.py
├── index.html
├── features_enhanced/
│   └── test_features_multiscale.h5
├── models_enhanced/
│   ├── best_ensemble_model_1.pt
│   ├── best_ensemble_model_2.pt
│   ├── best_ensemble_model_3.pt
│   └── best_ensemble_model_4.pt
└── data/
    └── processed/
        └── test/
            ├── Animation/
            │   └── {subcategory}/
            │       └── processed_data.pt
            ├── Flat_Content/
            │   └── {subcategory}/
            │       └── processed_data.pt
            ├── Gaming/
            │   └── {subcategory}/
            │       └── processed_data.pt
            └── Natural_Content/
                └── {subcategory}/
                    └── processed_data.pt

Pre-extracted features approach

Both inference paths use a pre-extracted features strategy rather than running the full CNN backbone at inference time. Features were extracted during training with a multi-scale EfficientNet-V2 pipeline and stored in a compressed HDF5 file.

Feature shape per video: [73, 1280] (73 frames, 1280-dim per frame)
Multi-scale extraction (scales 1.0, 0.85, 1.15 averaged) for spatial robustness
Loading a single video’s features reads only the relevant slice of the .h5 file

This makes inference fast: the temporal model (SuperEnhancedTemporalModel) runs in milliseconds on GPU, and feature loading from disk is the dominant cost.

For external videos (URLs or local files not in the test set), the script falls back to on-the-fly EfficientNet-B0 feature extraction. See Single-video classification for details.

Selected checkpoints

Both inference paths load the same four checkpoints defined in SELECTED_CHECKPOINTS:

SELECTED_CHECKPOINTS = [
    "best_ensemble_model_1.pt",
    "best_ensemble_model_2.pt",
    "best_ensemble_model_3.pt",
    "best_ensemble_model_4.pt",
]

Each checkpoint is a SuperEnhancedTemporalModel with the following architecture:

Parameter	Value
`feature_dim`	1280
`hidden_dim`	768
`num_classes`	4
`num_lstm_layers`	4
`num_attention_heads`	12
`dropout`	0.4
`bidirectional`	`True`

Checkpoint validation accuracies from configuration_analysis.json:

Checkpoint	Best val accuracy	Weighted F1
`best_ensemble_model_1.pt`	72.7%	64.8%
`best_ensemble_model_2.pt`	92.1%	92.1%
`best_ensemble_model_3.pt`	91.9%	91.9%
`best_ensemble_model_4.pt`	91.9%	91.9%

Model 1 was saved at an earlier training epoch (epoch 52) with lower accuracy. The ensemble averages probabilities across all four models, so models 2–4 dominate the final prediction.

GPU vs CPU inference

Device selection is automatic. The classifier checks torch.cuda.is_available() at initialization:

device_arg = 'cuda' if torch.cuda.is_available() and device == 'cuda' else 'cpu'
_classifier = user_module.SingleVideoClassifier(checkpoint_paths=paths, device=device_arg)

This applies to both the Flask app (get_classifier()) and the CLI script (SingleVideoClassifier.__init__). If no CUDA device is detected, inference falls back to CPU automatically.

Lazy singleton loading

The Flask app uses lazy singletons to avoid loading models on every request:

_feature_loader = None
_classifier = None

def get_feature_loader():
    global _feature_loader
    if _feature_loader is None:
        _feature_loader = user_module.SingleVideoFeatureLoader(
            features_dir=str(FEATURES_DIR),
            processed_test_dir=str(PROCESSED_TEST_DIR)
        )
    return _feature_loader

def get_classifier(device="cuda"):
    global _classifier
    if _classifier is None:
        # ... build checkpoint paths ...
        device_arg = 'cuda' if torch.cuda.is_available() and device == 'cuda' else 'cpu'
        _classifier = user_module.SingleVideoClassifier(checkpoint_paths=paths, device=device_arg)
        _classifier.load_models()
    return _classifier

The first request to /api/videos or /api/classify triggers model loading. Subsequent requests reuse the same objects.

Get Started

Concepts

Training Guide

Inference & Deployment

Model Cards

Evaluation

Flask web app

Command-line script

Prerequisites

Directory layout

Pre-extracted features approach

Selected checkpoints

GPU vs CPU inference

Lazy singleton loading

Next steps

Flask app

CLI script

Build docs developers (and LLMs) love

Get Started

Concepts

Training Guide

Inference & Deployment

Model Cards

Evaluation

Flask web app

Command-line script

​Prerequisites

​Directory layout

​Pre-extracted features approach

​Selected checkpoints

​GPU vs CPU inference

​Lazy singleton loading

​Next steps

Flask app

CLI script

Build docs developers (and LLMs) love

Prerequisites

Directory layout

Pre-extracted features approach

Selected checkpoints

GPU vs CPU inference

Lazy singleton loading

Next steps