Quickstart

Prerequisites

Before starting, ensure you have the following installed:

Requirement	Version	Notes
Python	3.8+	3.10 recommended
PyTorch	2.0+	Install with CUDA 11.8 build
CUDA	11.8	Required for GPU inference
cuDNN	Bundled	Comes with CUDA 11.8

GPU inference is strongly recommended. The classifier will automatically fall back to CPU if no CUDA device is detected, but inference will be slower.

Setup

Clone the repository

git clone https://github.com/itsmanask/NVIDIA-Video-Classification-Project.git
cd NVIDIA-Video-Classification-Project

Create a virtual environment

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

Install dependencies

Install all required packages. The core imports used by the training and inference scripts are:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install flask h5py numpy pandas opencv-python scikit-learn tqdm psutil matplotlib seaborn

For external video URL support (optional):

pip install yt-dlp requests

Place model checkpoints and feature files

The Flask app expects the following layout inside the Flask Local New/ directory by default. If the local folders are not present, the app falls back to /mnt/data/ paths.

Flask Local New/
├── app.py
├── test_already_extracted.py
├── model_train_new.py
├── index.html
├── models_enhanced/
│   ├── best_ensemble_model_1.pt
│   ├── best_ensemble_model_2.pt
│   ├── best_ensemble_model_3.pt
│   └── best_ensemble_model_4.pt
├── features_enhanced/
│   └── test_features_multiscale.h5
└── data/
    └── processed/
        └── test/
            ├── Animation/
            ├── Flat_Content/
            ├── Gaming/
            └── Natural_Content/

The app will also search D:/features_enhanced and D:/models_enhanced on Windows if the local folders do not exist. See app.py lines 22–44 for the full fallback logic.

Verify checkpoint configuration

The Flask app loads exactly these four checkpoints by default (defined in app.py):

SELECTED_CHECKPOINTS = [
    "best_ensemble_model_1.pt",
    "best_ensemble_model_2.pt",
    "best_ensemble_model_3.pt",
    "best_ensemble_model_4.pt",
]

Each checkpoint encodes the following model configuration:

Parameter	Value
`feature_dim`	1280
`hidden_dim`	768
`num_classes`	4
`num_lstm_layers`	4
`num_attention_heads`	12
`dropout`	0.4
`bidirectional`	`True`

To use a different set of checkpoints, edit the SELECTED_CHECKPOINTS list in app.py. Any .pt files found in models_enhanced/ are used as a fallback if none of the listed checkpoints exist.

Start the Flask server

From inside Flask Local New/:

python app.py

On startup, the app prints the resolved paths for all directories:

Index HTML path: /path/to/Flask Local New/index.html
User script path: /path/to/Flask Local New/test_already_extracted.py
Features dir: /path/to/Flask Local New/features_enhanced
Models dir: /path/to/Flask Local New/models_enhanced
Processed test dir: /path/to/Flask Local New/data/processed/test

The server listens on http://0.0.0.0:5000. Open http://localhost:5000 in your browser to access the web UI.

Running Inference via the Flask API

The Flask app exposes two JSON endpoints:

List available videos

curl http://localhost:5000/api/videos

Returns a mapping of category names to lists of video filenames available in the test dataset:

{
  "success": true,
  "videos": {
    "Animation": ["vid_001.mp4", "vid_002.mp4"],
    "Gaming": ["vid_010.mp4"],
    "Flat_Content": ["vid_020.mp4"],
    "Natural_Content": ["vid_030.mp4"]
  }
}

Classify a dataset video

curl -X POST http://localhost:5000/api/classify \
  -F "selected_video=vid_001.mp4" \
  -F "use_tta=false"

Example response:

{
  "success": true,
  "result": {
    "category": "Animation",
    "confidence": 94.3,
    "all_scores": {
      "Animation": 94.3,
      "Flat_Content": 2.1,
      "Gaming": 1.8,
      "Natural_Content": 1.8
    },
    "model_scores": [
      {"model": "best_ensemble_model_1.pt", "confidence": 91.2, "probs": [91.2, 3.1, 2.9, 2.8]},
      {"model": "best_ensemble_model_2.pt", "confidence": 95.8, "probs": [95.8, 1.5, 1.4, 1.3]},
      {"model": "best_ensemble_model_3.pt", "confidence": 94.1, "probs": [94.1, 2.3, 2.0, 1.6]},
      {"model": "best_ensemble_model_4.pt", "confidence": 95.9, "probs": [95.9, 1.4, 1.4, 1.3]}
    ],
    "video_name": "vid_001.mp4"
  }
}

Set use_tta=true to enable Test-Time Augmentation (4 augmentation modes: original, reversed, speed-up, speed-down) for higher accuracy at the cost of ~4× inference time.

The /api/classify endpoint only works with videos that are already indexed in the test feature file. It does not accept raw video uploads or external URLs — that workflow is handled by the CLI script described below.

Running Single-Video Inference from the CLI

For interactive single-video inference (including external video URLs and local file paths), use test_already_extracted.py directly:

cd "Flask Local New"
python test_already_extracted.py

The script will:

Scan the features_enhanced/ directory for the test HDF5 feature file
Build an index of all available videos from the data/processed/test/ directory structure
Prompt you to select a video by number, filename, URL, or local path

Select a dataset video by number

Enter video number or filename (or 'test' to run accuracy test, or paste a URL/local path): 5

Select a dataset video by filename

Enter video number or filename: vid_001.mp4

Classify an external video from a URL

Enter video number or filename: https://example.com/sample_video.mp4

When a URL or external file path is provided, the script:

Downloads the video using yt-dlp (falls back to requests for direct MP4 links)
Extracts 73 frames uniformly using an EfficientNet-B0 backbone (matches the 1280-dim feature space)
Runs ensemble inference and prints per-class confidence scores

For the best results on external videos, use yt-dlp which supports YouTube and hundreds of other video platforms. Install it with pip install yt-dlp.

Run a batch accuracy test

Type test at the prompt to randomly sample N videos from the test set and report overall accuracy:

Enter video number or filename: test
How many random videos to test? (default=20): 50
Use TTA for all tests? (y/n, default=n): y

At the end, the script prints a summary and optionally saves results to a timestamped JSON file.

Model Architecture Quick Reference

The SingleVideoClassifier in test_already_extracted.py loads the SuperEnhancedTemporalModel defined in model_train_new.py. The model config is read from each checkpoint:

checkpoint = torch.load(checkpoint_path, map_location=self.device, weights_only=False)
config = checkpoint.get('model_config', checkpoint.get('config', {}))

feature_dim      = config.get('feature_dim', 1280)
hidden_dim       = config.get('hidden_dim', 768)
num_classes      = config.get('num_classes', 4)
num_lstm_layers  = config.get('num_lstm_layers', 4)
num_attention_heads = config.get('num_attention_heads', 12)
dropout          = config.get('dropout', 0.4)
bidirectional    = config.get('bidirectional', True)

The class order is fixed at load time:

self.class_names = ['Animation', 'Flat_Content', 'Gaming', 'Natural_Content']

Troubleshooting

RuntimeError: Could not find 'test_already_extracted.py'

app.py looks for test_already_extracted.py in the same directory as itself, then in /mnt/data/. Make sure both files are in the same folder:

Flask Local New/
├── app.py
└── test_already_extracted.py   ← must be here

RuntimeError: No checkpoints found in models_enhanced

Ensure all four checkpoint files exist inside models_enhanced/:

ls models_enhanced/
# best_ensemble_model_1.pt
# best_ensemble_model_2.pt
# best_ensemble_model_3.pt
# best_ensemble_model_4.pt

If only some checkpoints are available, remove the missing names from SELECTED_CHECKPOINTS in app.py — the app will use whatever is present.

No test_features*.h5 file found

The feature loader expects a file matching test_features*.h5 in features_enhanced/. The canonical filename produced by the training pipeline is test_features_multiscale.h5.

CUDA out of memory during inference

The app automatically uses CPU if CUDA is unavailable, but you can also force CPU by changing the device argument in get_classifier():

_classifier = user_module.SingleVideoClassifier(
    checkpoint_paths=paths,
    device='cpu'   # force CPU
)

Get Started

Concepts

Training Guide

Inference & Deployment

Model Cards

Evaluation

Prerequisites

Setup

Running Inference via the Flask API

List available videos

Classify a dataset video

Running Single-Video Inference from the CLI

Select a dataset video by number

Select a dataset video by filename

Classify an external video from a URL

Run a batch accuracy test

Model Architecture Quick Reference

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Concepts

Training Guide

Inference & Deployment

Model Cards

Evaluation

​Prerequisites

​Setup

​Running Inference via the Flask API

​List available videos

​Classify a dataset video

​Running Single-Video Inference from the CLI

​Select a dataset video by number

​Select a dataset video by filename

​Classify an external video from a URL

​Run a batch accuracy test

​Model Architecture Quick Reference

​Troubleshooting

Build docs developers (and LLMs) love

Prerequisites

Setup

Running Inference via the Flask API

List available videos

Classify a dataset video

Running Single-Video Inference from the CLI

Select a dataset video by number

Select a dataset video by filename

Classify an external video from a URL

Run a batch accuracy test

Model Architecture Quick Reference

Troubleshooting