Prerequisites
Before starting, ensure you have the following installed:| Requirement | Version | Notes |
|---|---|---|
| Python | 3.8+ | 3.10 recommended |
| PyTorch | 2.0+ | Install with CUDA 11.8 build |
| CUDA | 11.8 | Required for GPU inference |
| cuDNN | Bundled | Comes with CUDA 11.8 |
GPU inference is strongly recommended. The classifier will automatically fall back to CPU if no CUDA device is detected, but inference will be slower.
Setup
Install dependencies
Install all required packages. The core imports used by the training and inference scripts are:For external video URL support (optional):
Place model checkpoints and feature files
The Flask app expects the following layout inside the
Flask Local New/ directory by default. If the local folders are not present, the app falls back to /mnt/data/ paths.Verify checkpoint configuration
The Flask app loads exactly these four checkpoints by default (defined in Each checkpoint encodes the following model configuration:
To use a different set of checkpoints, edit the
app.py):| Parameter | Value |
|---|---|
feature_dim | 1280 |
hidden_dim | 768 |
num_classes | 4 |
num_lstm_layers | 4 |
num_attention_heads | 12 |
dropout | 0.4 |
bidirectional | True |
SELECTED_CHECKPOINTS list in app.py. Any .pt files found in models_enhanced/ are used as a fallback if none of the listed checkpoints exist.Running Inference via the Flask API
The Flask app exposes two JSON endpoints:List available videos
Classify a dataset video
use_tta=true to enable Test-Time Augmentation (4 augmentation modes: original, reversed, speed-up, speed-down) for higher accuracy at the cost of ~4× inference time.
Running Single-Video Inference from the CLI
For interactive single-video inference (including external video URLs and local file paths), usetest_already_extracted.py directly:
- Scan the
features_enhanced/directory for the test HDF5 feature file - Build an index of all available videos from the
data/processed/test/directory structure - Prompt you to select a video by number, filename, URL, or local path
Select a dataset video by number
Select a dataset video by filename
Classify an external video from a URL
- Downloads the video using
yt-dlp(falls back torequestsfor direct MP4 links) - Extracts 73 frames uniformly using an EfficientNet-B0 backbone (matches the 1280-dim feature space)
- Runs ensemble inference and prints per-class confidence scores
Run a batch accuracy test
Typetest at the prompt to randomly sample N videos from the test set and report overall accuracy:
Model Architecture Quick Reference
TheSingleVideoClassifier in test_already_extracted.py loads the SuperEnhancedTemporalModel defined in model_train_new.py. The model config is read from each checkpoint:
Troubleshooting
RuntimeError: Could not find 'test_already_extracted.py'
RuntimeError: Could not find 'test_already_extracted.py'
app.py looks for test_already_extracted.py in the same directory as itself, then in /mnt/data/. Make sure both files are in the same folder:RuntimeError: No checkpoints found in models_enhanced
RuntimeError: No checkpoints found in models_enhanced
Ensure all four checkpoint files exist inside If only some checkpoints are available, remove the missing names from
models_enhanced/:SELECTED_CHECKPOINTS in app.py — the app will use whatever is present.No test_features*.h5 file found
No test_features*.h5 file found
The feature loader expects a file matching
test_features*.h5 in features_enhanced/. The canonical filename produced by the training pipeline is test_features_multiscale.h5.CUDA out of memory during inference
CUDA out of memory during inference
The app automatically uses CPU if CUDA is unavailable, but you can also force CPU by changing the device argument in
get_classifier():