Overview
Uses the trained classifier to predict which folder each unsorted video belongs to. Outputs ranked predictions with confidence scores and optionally moves files to predicted folders. Location:source/predict.py
Input Files:
artifacts/model_config.json- Model metadataartifacts/model.ptorartifacts/model.pkl- Trained modelartifacts/unlabeled_embeddings.pt- Features for unsorted videos
artifacts/predictions.json- Full predictions with confidence scores
Configuration Constants
Directory containing trained model and embeddings.
Directory where videos are stored and will be moved to.
Classes
MLP
Identical architecture to the MLP in train.py. Required for loading trained MLP models.Dimension of input features (typically 1024)
Number of output categories
Size of first hidden layer
Functions
load_model(config)
Loads the trained model based on configuration.Model configuration dictionary (from
model_config.json)tuple - (model, model_type) where:
model: Loaded model instance (sklearn or PyTorch)model_type: String indicating model type ("knn","logreg", or"mlp")
"knn"or"logreg": Loads frommodel.pklusing pickle"mlp": Loads MLP state dict frommodel.pt
predict_sklearn(model, features)
Generates predictions using sklearn models (k-NN or Logistic Regression).Trained sklearn classifier with
predict_proba methodFeature matrix of shape
[N, feature_dim]tuple - (predicted_labels, probabilities) where:
predicted_labels: Array of shape[N]with predicted class indicesprobabilities: Array of shape[N, num_classes]with class probabilities
predict_mlp(model, features, device=“cpu”)
Generates predictions using MLP model.Trained MLP model instance
Feature matrix of shape
[N, feature_dim]Device to run inference on (
"cuda" or "cpu")tuple - (predicted_labels, probabilities) where:
predicted_labels: Array of shape[N]with predicted class indicesprobabilities: Array of shape[N, num_classes]with softmax probabilities
main()
Main execution function that orchestrates prediction and optional file moving. Pipeline:- Parse command-line arguments
- Load model configuration and trained model
- Load unlabeled video embeddings
- Generate predictions with confidence scores
- Display top-k predictions for each video
- Print summary statistics (videos per folder)
- Optionally move files to predicted folders (if
--moveflag) - Save full predictions to JSON file
Command-Line Arguments
—move
Actually move files to predicted folders (default: False, just print predictions)
—threshold
Minimum confidence (0-1) required to auto-assign a video to a folder. Videos below threshold are skipped.
—top-k
Number of top predictions to show for each video.
Usage Examples
Print predictions only
Move all predictions
Move only high-confidence predictions
Show top 5 predictions
predictions.json Schema
The output JSON file contains detailed predictions for all unsorted videos:Video filename
Top predicted category folder
Confidence score for top prediction (0-1)
Array of top-k predictions with folder name and confidence score
File Moving Behavior
When--move is enabled:
- Validation: Checks that destination folder exists and filename is valid
- Conflict handling: Skips if file already exists in destination folder
- Atomic move: Uses
shutil.move()for safe file relocation - Feedback: Prints status for each file moved
- Threshold filtering: Only moves files meeting
--thresholdrequirement
- Videos are moved, not copied (prevents duplicates)
- Skips files that already exist in destination
- Validates folder names (prevents path traversal attacks)
- Only processes files matching
*.mp4pattern