Embedding Files

Overview

The TikTok Auto Collection Sorter stores video embeddings and predictions in three main artifact files:

labeled_embeddings.pt - Features and labels for training videos
unlabeled_embeddings.pt - Features for unsorted videos awaiting classification
predictions.json - Model predictions with confidence scores

All .pt files are PyTorch tensors saved with torch.save() and loaded with torch.load().

labeled_embeddings.pt

Generated by extract_features.py:198-205. Contains extracted features from videos already sorted into folders.

File Structure

labeled_embeddings.pt

dict

PyTorch dictionary containing training data.

Show Dictionary keys

features

torch.Tensor

Multi-modal embeddings for each video

Shape: (num_videos, 1024)
Type: torch.FloatTensor
Description: Each row is one video’s concatenated embedding:
- First 512 dimensions: CLIP visual features (averaged over N sampled frames)
- Last 512 dimensions: CLIP text features (from Whisper transcription)
Both modalities are L2-normalized before concatenation

Example:

# Shape: (150, 1024) for 150 labeled videos
tensor([[0.123, -0.045, ..., 0.089],  # Video 1
        [0.456, 0.012, ..., -0.234],  # Video 2
        ...])

labels

torch.Tensor

Class indices for each video

Shape: (num_videos,)
Type: torch.LongTensor
Description: Integer class labels corresponding to folder categories. Indices map to label_names list.

Example:

# Shape: (150,)
tensor([0, 2, 1, 0, 3, ...])  # 0=fitness, 1=funny, 2=soccer, etc.

label_names

list[str]

Sorted list of folder category names

Type: Python list of strings
Description: Maps integer indices to human-readable folder names. Order determines the class index mapping.

Example:

["art", "fitness", "funny", "music", "soccer"]
# Index 0 = "art", Index 1 = "fitness", etc.

video_paths

list[str]

File paths for each video

Type: Python list of strings
Description: Absolute paths to source video files. Order corresponds to features and labels tensors.

Example:

[
  "/home/user/data/Favorites/videos/soccer/video1.mp4",
  "/home/user/data/Favorites/videos/fitness/video2.mp4",
  ...
]

Loading Example

import torch

# Load labeled embeddings
data = torch.load("artifacts/labeled_embeddings.pt", weights_only=False)

# Access components
features = data["features"]        # torch.Tensor (N, 1024)
labels = data["labels"]            # torch.Tensor (N,)
label_names = data["label_names"]  # list[str]
video_paths = data["video_paths"]  # list[str]

print(f"Loaded {len(features)} videos")
print(f"Categories: {label_names}")
print(f"Feature shape: {features.shape}")  # (150, 1024)

unlabeled_embeddings.pt

Generated by extract_features.py:235-241. Contains extracted features from videos in the root folder (not yet sorted).

File Structure

unlabeled_embeddings.pt

dict

PyTorch dictionary containing features for unsorted videos.

Show Dictionary keys

features

torch.Tensor

Multi-modal embeddings for each unsorted video

Shape: (num_videos, 1024)
Type: torch.FloatTensor
Description: Same structure as labeled_embeddings.pt features (512-d visual + 512-d text, L2-normalized)

Example:

# Shape: (50, 1024) for 50 unsorted videos
tensor([[0.234, -0.123, ..., 0.456],
        [0.678, 0.234, ..., -0.123],
        ...])

video_paths

list[str]

File paths for each unsorted video

Type: Python list of strings
Description: Absolute paths to unsorted video files in the root videos folder

Example:

[
  "/home/user/data/Favorites/videos/unsorted_video1.mp4",
  "/home/user/data/Favorites/videos/unsorted_video2.mp4",
  ...
]

Note: No labels or label_names keys since these videos are unlabeled.

Loading Example

import torch

# Load unlabeled embeddings
data = torch.load("artifacts/unlabeled_embeddings.pt", weights_only=False)

features = data["features"]        # torch.Tensor (M, 1024)
video_paths = data["video_paths"]  # list[str]

print(f"Found {len(features)} unsorted videos to classify")

predictions.json

Generated by predict.py:158-173. Contains model predictions for all unsorted videos.

File Structure

predictions.json

list[dict]

JSON array of prediction objects, one per unsorted video.

Show Array structure

Each object in the array has the following fields:

video

string

Filename of the video (basename only, not full path)Example: "unsorted_video1.mp4"

predicted_folder

string

Top prediction - the folder category with highest confidenceExample: "soccer"

confidence

float

Confidence score for the top prediction (0.0 to 1.0)Example: 0.87 (87% confidence)

top_predictions

array[object]

Ranked list of top-k predictions with confidence scores

Show Prediction object structure

folder

string

Category name

confidence

float

Confidence score (0.0 to 1.0)

Example:

[
  {"folder": "soccer", "confidence": 0.87},
  {"folder": "fitness", "confidence": 0.09},
  {"folder": "funny", "confidence": 0.04}
]

Complete Example

[
  {
    "video": "tiktok_12345.mp4",
    "predicted_folder": "soccer",
    "confidence": 0.87,
    "top_predictions": [
      {"folder": "soccer", "confidence": 0.87},
      {"folder": "fitness", "confidence": 0.09},
      {"folder": "funny", "confidence": 0.04}
    ]
  },
  {
    "video": "tiktok_67890.mp4",
    "predicted_folder": "music",
    "confidence": 0.62,
    "top_predictions": [
      {"folder": "music", "confidence": 0.62},
      {"folder": "art", "confidence": 0.31},
      {"folder": "funny", "confidence": 0.07}
    ]
  }
]

Loading Example

import json

# Load predictions
with open("artifacts/predictions.json") as f:
    predictions = json.load(f)

for pred in predictions:
    video = pred["video"]
    folder = pred["predicted_folder"]
    conf = pred["confidence"]
    print(f"{video} → {folder} ({conf:.0%})")

model_config.json

Generated by train.py:220-225. Stores model metadata and configuration.

File Structure

model_config.json

dict

JSON object containing model configuration and metadata.

Show Configuration fields

model_type

string

Type of trained model: "mlp", "knn", or "logreg"

input_dim

int

Feature dimension (1024 for CLIP visual + text)Only present for MLP models

num_classes

int

Number of output classes (folder categories)Only present for MLP models

hidden_dim

int

Hidden layer sizeOnly present for MLP models. Default: 256

label_names

list[string]

Sorted list of category names matching class indices

feature_dim

int

Input feature dimension (same as input_dim)

best_cv_accuracy

float

Best cross-validation accuracy achieved during training

Example

{
  "model_type": "mlp",
  "input_dim": 1024,
  "num_classes": 5,
  "hidden_dim": 256,
  "label_names": ["art", "fitness", "funny", "music", "soccer"],
  "feature_dim": 1024,
  "best_cv_accuracy": 0.89
}

Data Flow Summary

Feature Extraction (extract_features.py)
- Input: Raw .mp4 video files
- Output: labeled_embeddings.pt + unlabeled_embeddings.pt
Model Training (train.py)
- Input: labeled_embeddings.pt
- Output: model.pt + model_config.json
Prediction (predict.py)
- Input: unlabeled_embeddings.pt + model.pt + model_config.json
- Output: predictions.json

Scripts

Models

Overview

labeled_embeddings.pt

File Structure

Loading Example

unlabeled_embeddings.pt

File Structure

Loading Example

predictions.json

File Structure

Complete Example

Loading Example

model_config.json

File Structure

Example

Data Flow Summary

Build docs developers (and LLMs) love

Scripts

Models

​Overview

​labeled_embeddings.pt

​File Structure

​Loading Example

​unlabeled_embeddings.pt

​File Structure

​Loading Example

​predictions.json

​File Structure

​Complete Example

​Loading Example

​model_config.json

​File Structure

​Example

​Data Flow Summary

Build docs developers (and LLMs) love

Overview

labeled_embeddings.pt

File Structure

Loading Example

unlabeled_embeddings.pt

File Structure

Loading Example

predictions.json

File Structure

Complete Example

Loading Example

model_config.json

File Structure

Example

Data Flow Summary