Skip to main content

Prerequisites

Before starting, ensure you have the following installed:
RequirementVersionNotes
Python3.8+3.10 recommended
PyTorch2.0+Install with CUDA 11.8 build
CUDA11.8Required for GPU inference
cuDNNBundledComes with CUDA 11.8
GPU inference is strongly recommended. The classifier will automatically fall back to CPU if no CUDA device is detected, but inference will be slower.

Setup

1

Clone the repository

git clone https://github.com/itsmanask/NVIDIA-Video-Classification-Project.git
cd NVIDIA-Video-Classification-Project
2

Create a virtual environment

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
3

Install dependencies

Install all required packages. The core imports used by the training and inference scripts are:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install flask h5py numpy pandas opencv-python scikit-learn tqdm psutil matplotlib seaborn
For external video URL support (optional):
pip install yt-dlp requests
4

Place model checkpoints and feature files

The Flask app expects the following layout inside the Flask Local New/ directory by default. If the local folders are not present, the app falls back to /mnt/data/ paths.
Flask Local New/
├── app.py
├── test_already_extracted.py
├── model_train_new.py
├── index.html
├── models_enhanced/
│   ├── best_ensemble_model_1.pt
│   ├── best_ensemble_model_2.pt
│   ├── best_ensemble_model_3.pt
│   └── best_ensemble_model_4.pt
├── features_enhanced/
│   └── test_features_multiscale.h5
└── data/
    └── processed/
        └── test/
            ├── Animation/
            ├── Flat_Content/
            ├── Gaming/
            └── Natural_Content/
The app will also search D:/features_enhanced and D:/models_enhanced on Windows if the local folders do not exist. See app.py lines 22–44 for the full fallback logic.
5

Verify checkpoint configuration

The Flask app loads exactly these four checkpoints by default (defined in app.py):
SELECTED_CHECKPOINTS = [
    "best_ensemble_model_1.pt",
    "best_ensemble_model_2.pt",
    "best_ensemble_model_3.pt",
    "best_ensemble_model_4.pt",
]
Each checkpoint encodes the following model configuration:
ParameterValue
feature_dim1280
hidden_dim768
num_classes4
num_lstm_layers4
num_attention_heads12
dropout0.4
bidirectionalTrue
To use a different set of checkpoints, edit the SELECTED_CHECKPOINTS list in app.py. Any .pt files found in models_enhanced/ are used as a fallback if none of the listed checkpoints exist.
6

Start the Flask server

From inside Flask Local New/:
python app.py
On startup, the app prints the resolved paths for all directories:
Index HTML path: /path/to/Flask Local New/index.html
User script path: /path/to/Flask Local New/test_already_extracted.py
Features dir: /path/to/Flask Local New/features_enhanced
Models dir: /path/to/Flask Local New/models_enhanced
Processed test dir: /path/to/Flask Local New/data/processed/test
The server listens on http://0.0.0.0:5000. Open http://localhost:5000 in your browser to access the web UI.

Running Inference via the Flask API

The Flask app exposes two JSON endpoints:

List available videos

curl http://localhost:5000/api/videos
Returns a mapping of category names to lists of video filenames available in the test dataset:
{
  "success": true,
  "videos": {
    "Animation": ["vid_001.mp4", "vid_002.mp4"],
    "Gaming": ["vid_010.mp4"],
    "Flat_Content": ["vid_020.mp4"],
    "Natural_Content": ["vid_030.mp4"]
  }
}

Classify a dataset video

curl -X POST http://localhost:5000/api/classify \
  -F "selected_video=vid_001.mp4" \
  -F "use_tta=false"
Example response:
{
  "success": true,
  "result": {
    "category": "Animation",
    "confidence": 94.3,
    "all_scores": {
      "Animation": 94.3,
      "Flat_Content": 2.1,
      "Gaming": 1.8,
      "Natural_Content": 1.8
    },
    "model_scores": [
      {"model": "best_ensemble_model_1.pt", "confidence": 91.2, "probs": [91.2, 3.1, 2.9, 2.8]},
      {"model": "best_ensemble_model_2.pt", "confidence": 95.8, "probs": [95.8, 1.5, 1.4, 1.3]},
      {"model": "best_ensemble_model_3.pt", "confidence": 94.1, "probs": [94.1, 2.3, 2.0, 1.6]},
      {"model": "best_ensemble_model_4.pt", "confidence": 95.9, "probs": [95.9, 1.4, 1.4, 1.3]}
    ],
    "video_name": "vid_001.mp4"
  }
}
Set use_tta=true to enable Test-Time Augmentation (4 augmentation modes: original, reversed, speed-up, speed-down) for higher accuracy at the cost of ~4× inference time.
The /api/classify endpoint only works with videos that are already indexed in the test feature file. It does not accept raw video uploads or external URLs — that workflow is handled by the CLI script described below.

Running Single-Video Inference from the CLI

For interactive single-video inference (including external video URLs and local file paths), use test_already_extracted.py directly:
cd "Flask Local New"
python test_already_extracted.py
The script will:
  1. Scan the features_enhanced/ directory for the test HDF5 feature file
  2. Build an index of all available videos from the data/processed/test/ directory structure
  3. Prompt you to select a video by number, filename, URL, or local path

Select a dataset video by number

Enter video number or filename (or 'test' to run accuracy test, or paste a URL/local path): 5

Select a dataset video by filename

Enter video number or filename: vid_001.mp4

Classify an external video from a URL

Enter video number or filename: https://example.com/sample_video.mp4
When a URL or external file path is provided, the script:
  1. Downloads the video using yt-dlp (falls back to requests for direct MP4 links)
  2. Extracts 73 frames uniformly using an EfficientNet-B0 backbone (matches the 1280-dim feature space)
  3. Runs ensemble inference and prints per-class confidence scores
For the best results on external videos, use yt-dlp which supports YouTube and hundreds of other video platforms. Install it with pip install yt-dlp.

Run a batch accuracy test

Type test at the prompt to randomly sample N videos from the test set and report overall accuracy:
Enter video number or filename: test
How many random videos to test? (default=20): 50
Use TTA for all tests? (y/n, default=n): y
At the end, the script prints a summary and optionally saves results to a timestamped JSON file.

Model Architecture Quick Reference

The SingleVideoClassifier in test_already_extracted.py loads the SuperEnhancedTemporalModel defined in model_train_new.py. The model config is read from each checkpoint:
checkpoint = torch.load(checkpoint_path, map_location=self.device, weights_only=False)
config = checkpoint.get('model_config', checkpoint.get('config', {}))

feature_dim      = config.get('feature_dim', 1280)
hidden_dim       = config.get('hidden_dim', 768)
num_classes      = config.get('num_classes', 4)
num_lstm_layers  = config.get('num_lstm_layers', 4)
num_attention_heads = config.get('num_attention_heads', 12)
dropout          = config.get('dropout', 0.4)
bidirectional    = config.get('bidirectional', True)
The class order is fixed at load time:
self.class_names = ['Animation', 'Flat_Content', 'Gaming', 'Natural_Content']

Troubleshooting

app.py looks for test_already_extracted.py in the same directory as itself, then in /mnt/data/. Make sure both files are in the same folder:
Flask Local New/
├── app.py
└── test_already_extracted.py   ← must be here
Ensure all four checkpoint files exist inside models_enhanced/:
ls models_enhanced/
# best_ensemble_model_1.pt
# best_ensemble_model_2.pt
# best_ensemble_model_3.pt
# best_ensemble_model_4.pt
If only some checkpoints are available, remove the missing names from SELECTED_CHECKPOINTS in app.py — the app will use whatever is present.
The feature loader expects a file matching test_features*.h5 in features_enhanced/. The canonical filename produced by the training pipeline is test_features_multiscale.h5.
The app automatically uses CPU if CUDA is unavailable, but you can also force CPU by changing the device argument in get_classifier():
_classifier = user_module.SingleVideoClassifier(
    checkpoint_paths=paths,
    device='cpu'   # force CPU
)

Build docs developers (and LLMs) love