predict.py

Overview

Uses the trained classifier to predict which folder each unsorted video belongs to. Outputs ranked predictions with confidence scores and optionally moves files to predicted folders. Location: source/predict.py Input Files:

artifacts/model_config.json - Model metadata
artifacts/model.pt or artifacts/model.pkl - Trained model
artifacts/unlabeled_embeddings.pt - Features for unsorted videos

Output Files:

artifacts/predictions.json - Full predictions with confidence scores

Configuration Constants

ARTIFACTS_DIR

Path

default:"artifacts"

Directory containing trained model and embeddings.

DATA_DIR

Path

default:"data/Favorites/videos"

Directory where videos are stored and will be moved to.

Classes

MLP

Identical architecture to the MLP in train.py. Required for loading trained MLP models.

MLP(input_dim, num_classes, hidden_dim=256)

input_dim

int

required

Dimension of input features (typically 1024)

num_classes

int

required

Number of output categories

hidden_dim

int

default:"256"

Size of first hidden layer

Functions

load_model(config)

Loads the trained model based on configuration.

config

dict

required

Model configuration dictionary (from model_config.json)

Returns: tuple - (model, model_type) where:

model: Loaded model instance (sklearn or PyTorch)
model_type: String indicating model type ("knn", "logreg", or "mlp")

Supported Models:

"knn" or "logreg": Loads from model.pkl using pickle
"mlp": Loads MLP state dict from model.pt

with open("artifacts/model_config.json") as f:
    config = json.load(f)

model, model_type = load_model(config)
print(f"Loaded {model_type} model")

predict_sklearn(model, features)

Generates predictions using sklearn models (k-NN or Logistic Regression).

model

sklearn model

required

Trained sklearn classifier with predict_proba method

features

np.ndarray

required

Feature matrix of shape [N, feature_dim]

Returns: tuple - (predicted_labels, probabilities) where:

predicted_labels: Array of shape [N] with predicted class indices
probabilities: Array of shape [N, num_classes] with class probabilities

preds, probs = predict_sklearn(knn_model, features)
print(f"Predicted class: {preds[0]}, Confidence: {probs[0][preds[0]]:.2%}")

predict_mlp(model, features, device=“cpu”)

Generates predictions using MLP model.

model

MLP

required

Trained MLP model instance

features

np.ndarray

required

Feature matrix of shape [N, feature_dim]

device

str

default:"cpu"

Device to run inference on ("cuda" or "cpu")

Returns: tuple - (predicted_labels, probabilities) where:

predicted_labels: Array of shape [N] with predicted class indices
probabilities: Array of shape [N, num_classes] with softmax probabilities

preds, probs = predict_mlp(mlp_model, features, device="cuda")
for i, pred in enumerate(preds):
    print(f"Video {i}: {pred} (confidence: {probs[i][pred]:.2%})")

main()

Main execution function that orchestrates prediction and optional file moving. Pipeline:

Parse command-line arguments
Load model configuration and trained model
Load unlabeled video embeddings
Generate predictions with confidence scores
Display top-k predictions for each video
Print summary statistics (videos per folder)
Optionally move files to predicted folders (if --move flag)
Save full predictions to JSON file

Console Output Example:

Predicting folders for 25 unsorted videos
Model: mlp | Categories: ['funny', 'food', 'soccer']
Confidence threshold: 0.5

  [ASSIGN] 7234567890123456.mp4                   → soccer       (87%)  [soccer: 87% | food: 10% | funny: 3%]
  [SKIP  ] 7234567890123457.mp4                   → funny        (42%)  [funny: 42% | food: 35% | soccer: 23%]
  [ASSIGN] 7234567890123458.mp4                   → food         (95%)  [food: 95% | soccer: 4% | funny: 1%]

============================================================
Summary:
  soccer         :   15 videos
  food           :    8 videos
  funny          :    1 videos
  SKIPPED        :    1 videos (below 50% threshold)
  TOTAL          :   25 videos

if __name__ == "__main__":
    main()

Command-Line Arguments

—move

--move

flag

Actually move files to predicted folders (default: False, just print predictions)

When enabled, moves videos from root directory to predicted category subfolders. Only moves files that meet the confidence threshold.

python predict.py --move

—threshold

--threshold

float

default:"0.0"

Minimum confidence (0-1) required to auto-assign a video to a folder. Videos below threshold are skipped.

Useful for only moving videos with high-confidence predictions:

python predict.py --move --threshold 0.7

—top-k

--top-k

int

default:"3"

Number of top predictions to show for each video.

Displays the k most likely categories with their confidence scores:

python predict.py --top-k 5

Usage Examples

Print predictions only

python predict.py

Displays predictions with confidence scores but doesn’t move any files.

Move all predictions

python predict.py --move

Moves all unsorted videos to their predicted folders (regardless of confidence).

Move only high-confidence predictions

python predict.py --move --threshold 0.8

Only moves videos where the model is at least 80% confident.

Show top 5 predictions

python predict.py --top-k 5

Displays the top 5 most likely categories for each video.

predictions.json Schema

The output JSON file contains detailed predictions for all unsorted videos:

[
  {
    "video": "7234567890123456.mp4",
    "predicted_folder": "soccer",
    "confidence": 0.87,
    "top_predictions": [
      {"folder": "soccer", "confidence": 0.87},
      {"folder": "food", "confidence": 0.10},
      {"folder": "funny", "confidence": 0.03}
    ]
  }
]

video

string

required

Video filename

predicted_folder

string

required

Top predicted category folder

confidence

float

required

Confidence score for top prediction (0-1)

top_predictions

array

required

Array of top-k predictions with folder name and confidence score

File Moving Behavior

When --move is enabled:

Validation: Checks that destination folder exists and filename is valid
Conflict handling: Skips if file already exists in destination folder
Atomic move: Uses shutil.move() for safe file relocation
Feedback: Prints status for each file moved
Threshold filtering: Only moves files meeting --threshold requirement

Safety features:

Videos are moved, not copied (prevents duplicates)
Skips files that already exist in destination
Validates folder names (prevents path traversal attacks)
Only processes files matching *.mp4 pattern

Scripts

Models

Overview

Configuration Constants

Classes

MLP

Functions

load_model(config)

predict_sklearn(model, features)

predict_mlp(model, features, device=“cpu”)

main()

Command-Line Arguments

—move

—threshold

—top-k

Usage Examples

Print predictions only

Move all predictions

Move only high-confidence predictions

Show top 5 predictions

predictions.json Schema

File Moving Behavior

Build docs developers (and LLMs) love

Scripts

Models

​Overview

​Configuration Constants

​Classes

​MLP

​Functions

​load_model(config)

​predict_sklearn(model, features)

​predict_mlp(model, features, device=“cpu”)

​main()

​Command-Line Arguments

​—move

​—threshold

​—top-k

​Usage Examples

​Print predictions only

​Move all predictions

​Move only high-confidence predictions

​Show top 5 predictions

​predictions.json Schema

​File Moving Behavior

Build docs developers (and LLMs) love

Overview

Configuration Constants

Classes

MLP

Functions

load_model(config)

predict_sklearn(model, features)

predict_mlp(model, features, device=“cpu”)

main()

Command-Line Arguments

—move

—threshold

—top-k

Usage Examples

Print predictions only

Move all predictions

Move only high-confidence predictions

Show top 5 predictions

predictions.json Schema

File Moving Behavior