Skip to main content
test_already_extracted.py provides an interactive command-line interface for classifying videos. It supports dataset videos (via pre-extracted features), external video URLs, and local video files outside the dataset.

Running the script

python test_already_extracted.py
The script starts in interactive mode and displays the full list of available test videos grouped by category.

Interactive mode

After listing available videos, the script prompts:
Enter video number or filename (or 'test' to run accuracy test, or paste a URL/local path):
You can respond in one of four ways:
InputBehavior
A number (e.g. 42)Selects the video at that position in the displayed list
A filename (e.g. video_001.mp4)Looks up the video by name in the feature index
A URL or local pathDownloads/reads the video and extracts features on the fly
testRuns a batch accuracy test on a random sample
Quotes around filenames are stripped automatically, so pasting 'video_001.mp4' or "video_001.mp4" both work.

SingleVideoFeatureLoader

SingleVideoFeatureLoader manages reading pre-extracted features from the HDF5 file and building the filename-to-index mapping from processed_data.pt files.

Initialization

from test_already_extracted import SingleVideoFeatureLoader

loader = SingleVideoFeatureLoader(
    features_dir="/path/to/features_enhanced",
    processed_test_dir="/path/to/data/processed/test"
)
Parameters
features_dir
string
required
Path to the directory containing the HDF5 features file. The loader looks for any file matching test_features*.h5 inside this directory.
processed_test_dir
string
required
Path to data/processed/test. The loader iterates category and subcategory subdirectories in sorted order to map video filenames to their position in the .h5 file.
During initialization the loader:
  1. Finds test_features*.h5 in features_dir (exits if not found).
  2. Reads category_mapping from the file’s HDF5 attributes.
  3. Calls _build_video_index() to scan processed_data.pt files and construct video_index: dict[str, (int, int, str)] — mapping video filename to (h5_index, label, category_name).

list_available_videos()

all_videos, videos_by_category = loader.list_available_videos()
Returns
  • all_videos — flat list of all video filenames in sorted-category order.
  • videos_by_categorydict[str, list[str]] mapping category name to sorted video filenames.
Example output:
Animation (87 videos):
------------------------------------------------------------
     1. animation_clip_001.mp4
     2. animation_clip_002.mp4
   ...
Gaming (103 videos):
------------------------------------------------------------
    88. gaming_clip_001.mp4

load_video_features(video_name)

features, label, category_name = loader.load_video_features("video_001.mp4")
Parameters
video_name
string
required
Exact filename as it appears in video_index. Returns (None, None, None) if the name is not found.
Returns
  • featurestorch.Tensor of shape [T, 1280] where T is the number of valid frames for that video (up to 73).
  • label — integer class index from the HDF5 labels array.
  • category_name — string category name (e.g. "Gaming").
# Load features and inspect
features, label, category_name = loader.load_video_features("video_020.mp4")
print(features.shape)   # torch.Size([73, 1280])
print(category_name)    # Gaming

SingleVideoClassifier

SingleVideoClassifier loads the ensemble of trained checkpoints and provides standard and TTA prediction methods.

Initialization

from test_already_extracted import SingleVideoClassifier

classifier = SingleVideoClassifier(
    checkpoint_paths=[
        "/path/to/models_enhanced/best_ensemble_model_1.pt",
        "/path/to/models_enhanced/best_ensemble_model_2.pt",
        "/path/to/models_enhanced/best_ensemble_model_3.pt",
        "/path/to/models_enhanced/best_ensemble_model_4.pt",
    ],
    device="cuda"
)
classifier.load_models()
Parameters
checkpoint_paths
string[]
required
List of paths to .pt checkpoint files. Paths can be strings or Path objects. Order affects the per_model_scores breakdown but not the final ensemble result.
device
string
default:"cuda"
Preferred compute device. Actual device used is cuda only when torch.cuda.is_available() returns True; otherwise falls back to cpu regardless of this parameter.
After construction, call load_models() before calling any prediction method.

load_models()

Loads each checkpoint, reconstructs the SuperEnhancedTemporalModel from the saved model_config, and calls model.eval().
classifier.load_models()
# Output:
# Loading 4 model(s)...
#    Loading model 1/4: best_ensemble_model_1.pt
#       Val accuracy: 72.73%
#    Loading model 2/4: best_ensemble_model_2.pt
#       Val accuracy: 92.13%
# ...
# Feature dim: 1280
# Classes: ['Animation', 'Flat_Content', 'Gaming', 'Natural_Content']
After loading, the following attributes are set:
  • classifier.models — list of loaded SuperEnhancedTemporalModel instances.
  • classifier.class_names['Animation', 'Flat_Content', 'Gaming', 'Natural_Content'].
  • classifier.feature_dim1280.
  • classifier.num_classes4.

predict_standard(features)

Runs standard ensemble inference without augmentation.
probs = classifier.predict_standard(features)  # torch.Tensor, shape [4]
Each model processes features independently. Softmax probabilities are averaged across all models:
def predict_standard(self, features):
    all_model_predictions = []
    with torch.no_grad():
        features_batch = features.unsqueeze(0).to(self.device)  # [1, T, D]
        lengths = torch.tensor([features.shape[0]], device=self.device)
        for model in self.models:
            outputs = model(features_batch, lengths)  # [1, num_classes]
            probs = F.softmax(outputs, dim=1)
            all_model_predictions.append(probs.squeeze(0).cpu())
    ensemble_probs = torch.stack(all_model_predictions).mean(dim=0)
    return ensemble_probs
Returns torch.Tensor of shape [num_classes] with averaged softmax probabilities.

predict_with_tta(features)

Runs Test-Time Augmentation using 4 augmentation modes. Each mode independently calls predict_standard, then probabilities are averaged.
probs = classifier.predict_with_tta(features)  # torch.Tensor, shape [4]
TTA modeDescription
Mode 1: OriginalFeatures as-is
Mode 2: Reversetorch.flip(features, dims=[0]) — reversed temporal order
Mode 3: Speed upSubsample to T//2 frames uniformly (requires T > 10)
Mode 4: Speed downUpsample to T*1.5 frames by interpolating indices (requires T > 10)
def predict_with_tta(self, features):
    tta_predictions = []

    # Mode 1: Original
    tta_predictions.append(self.predict_standard(features))

    # Mode 2: Reverse
    features_reversed = torch.flip(features, dims=[0])
    tta_predictions.append(self.predict_standard(features_reversed))

    # Mode 3: Speed up (skip frames)
    if features.shape[0] > 10:
        indices = torch.linspace(0, features.shape[0]-1, features.shape[0]//2).long()
        tta_predictions.append(self.predict_standard(features[indices]))

    # Mode 4: Speed down (interpolate frames)
    if features.shape[0] > 10:
        indices = torch.linspace(0, features.shape[0]-1, int(features.shape[0]*1.5)).long()
        indices = indices.clamp(max=features.shape[0]-1)
        tta_predictions.append(self.predict_standard(features[indices]))

    return torch.stack(tta_predictions).mean(dim=0)
Returns torch.Tensor of shape [num_classes] with TTA-averaged probabilities.

classify_video(features, true_label, video_name, use_tta)

Runs inference and returns a structured result dictionary.
result = classifier.classify_video(
    features=features,
    true_label=true_label_for_model,  # or None for external videos
    video_name="video_020.mp4",
    use_tta=False
)
Parameters
features
Tensor
required
Frame features tensor of shape [T, feature_dim].
true_label
int
Ground-truth class index. Pass None for external videos where the true label is unknown. When provided, the result includes is_correct.
video_name
string
default:"Unknown"
Display name used in console output and the result dictionary.
use_tta
boolean
default:"false"
When True, calls predict_with_tta; otherwise calls predict_standard.
Result dictionary
result = {
    'video_name': 'video_020.mp4',
    'true_class': 'Gaming',             # None if true_label was None
    'predicted_class': 'Gaming',
    'predicted_confidence': 94.37,      # float, percentage
    'is_correct': True,                 # None if true_label was None
    'all_scores': {
        'Animation': 1.65,
        'Flat_Content': 2.45,
        'Gaming': 94.37,
        'Natural_Content': 1.53
    },
    'timestamp': '2026-03-17T10:23:45.123456',
    'tta_used': False,
    'ensemble_size': 4
}
video_name
string
The value passed as video_name.
true_class
string | null
Class name corresponding to true_label, or null if true_label was None.
predicted_class
string
Class name with the highest ensemble probability.
predicted_confidence
number
Confidence of the predicted class as a percentage.
is_correct
boolean | null
Whether the prediction matched true_label. null when true_label was not provided.
all_scores
object
Ensemble probability for every class, keyed by class name, as percentages.
timestamp
string
ISO 8601 timestamp of when inference ran.
tta_used
boolean
Whether TTA was applied.
ensemble_size
number
Number of models used in the ensemble.

External video inference

When the input is a URL or a local file path not present in the feature index, the script downloads or reads the video and extracts features on the fly before classifying.

download_video_if_needed(url_or_path)

local_path = download_video_if_needed(url_or_path, tmp_dir=None)
If url_or_path is an existing local file, it is returned unchanged. Otherwise:
  1. yt-dlp — attempts to download the best MP4 using yt_dlp.YoutubeDL. Works with YouTube URLs and other yt-dlp-supported platforms.
  2. requests fallback — if yt-dlp fails or is not installed, streams the URL directly via requests.get. This only works for direct .mp4 links.
try:
    import yt_dlp
    ydl_opts = {
        'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best',
        'outtmpl': out_path,
        'quiet': True,
        'merge_output_format': 'mp4',
        'noplaylist': True,
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url_or_path])
except Exception as e:
    print(f"yt_dlp failed or not available: {e}")
    # fallback to requests ...

extract_video_features(video_path, model_feature_dim, num_frames, device)

Extracts [T, 1280] features from a raw video file using an EfficientNet-B0 backbone.
features = extract_video_features(
    video_path=local_video,
    model_feature_dim=1280,
    num_frames=73,
    device=classifier.device
)
print(features.shape)  # torch.Size([73, 1280])
Parameters
video_path
string
required
Path to the local video file.
model_feature_dim
number
required
Expected output feature dimension. Should match classifier.feature_dim (1280). If the backbone output dimension differs, a linear projection is applied automatically.
num_frames
number
default:"73"
Number of frames to sample uniformly from the video. Uses the same count as the pre-extracted test features.
device
string
default:"cpu"
Compute device for the EfficientNet-B0 backbone. Accepts 'cpu', 'cuda', or a torch.device object.
The function:
  1. Opens the video with OpenCV and counts total frames.
  2. Computes num_frames uniformly spaced frame indices.
  3. Loads pretrained EfficientNet-B0 (torchvision.models.efficientnet_b0(pretrained=True)).
  4. Applies the same preprocessing as EfficientNet defaults: resize to 256, center crop to 224, normalize with ImageNet mean/std.
  5. Passes frames through eff.features (convolutional backbone) and applies global average pooling to get [T, 1280].
  6. Returns features on CPU as a torch.Tensor.
EfficientNet-B0 feature extraction can take 1–5 minutes for a typical video depending on hardware. The pre-extracted HDF5 approach is significantly faster for test-set videos.

Full external video example

from test_already_extracted import (
    SingleVideoClassifier,
    download_video_if_needed,
    extract_video_features,
)
import tempfile

# Setup classifier
checkpoint_paths = [
    "models_enhanced/best_ensemble_model_1.pt",
    "models_enhanced/best_ensemble_model_2.pt",
    "models_enhanced/best_ensemble_model_3.pt",
    "models_enhanced/best_ensemble_model_4.pt",
]
classifier = SingleVideoClassifier(checkpoint_paths=checkpoint_paths, device="cuda")
classifier.load_models()

# Download or locate video
tmp_dir = tempfile.mkdtemp()
local_video = download_video_if_needed("https://example.com/video.mp4", tmp_dir=tmp_dir)

# Extract features
features = extract_video_features(
    local_video,
    model_feature_dim=classifier.feature_dim,
    num_frames=73,
    device=classifier.device
)

# Classify
result = classifier.classify_video(
    features=features,
    true_label=None,
    video_name=local_video,
    use_tta=False
)
print(result["predicted_class"], result["predicted_confidence"])

Batch test mode

Enter test at the video selection prompt to run an accuracy test on a random sample of the test set.
Enter video number or filename (or 'test' to run accuracy test, or paste a URL/local path): test

How many random videos to test? (default=20): 50
Use TTA for all tests? (y/n, default=n): n
The script samples the requested number of videos, classifies each one, and prints a summary:
======================================================================
BATCH TEST SUMMARY
======================================================================
Total tested: 50
Correct: 46
Incorrect: 4
Accuracy: 92.00%
======================================================================
At the end you are prompted to save all individual results to a timestamped JSON file:
Save batch results to JSON? (y/n, default=n): y
Result saved to: batch_test_results_20260317_102345.json
The saved file structure:
{
  "summary": {
    "total": 50,
    "correct": 46,
    "accuracy": 0.92
  },
  "results": [
    {
      "video_name": "video_020.mp4",
      "true_class": "Gaming",
      "predicted_class": "Gaming",
      "predicted_confidence": 94.37,
      "is_correct": true,
      "all_scores": { "Animation": 1.65, "Flat_Content": 2.45, "Gaming": 94.37, "Natural_Content": 1.53 },
      "timestamp": "2026-03-17T10:23:45.123456",
      "tta_used": false,
      "ensemble_size": 4
    }
  ]
}

Build docs developers (and LLMs) love