Overview
The TikTok Auto Collection Sorter stores video embeddings and predictions in three main artifact files:labeled_embeddings.pt- Features and labels for training videosunlabeled_embeddings.pt- Features for unsorted videos awaiting classificationpredictions.json- Model predictions with confidence scores
.pt files are PyTorch tensors saved with torch.save() and loaded with torch.load().
labeled_embeddings.pt
Generated byextract_features.py:198-205. Contains extracted features from videos already sorted into folders.
File Structure
PyTorch dictionary containing training data.
Loading Example
unlabeled_embeddings.pt
Generated byextract_features.py:235-241. Contains extracted features from videos in the root folder (not yet sorted).
File Structure
PyTorch dictionary containing features for unsorted videos.Note: No
labels or label_names keys since these videos are unlabeled.Loading Example
predictions.json
Generated bypredict.py:158-173. Contains model predictions for all unsorted videos.
File Structure
JSON array of prediction objects, one per unsorted video.
Complete Example
Loading Example
model_config.json
Generated bytrain.py:220-225. Stores model metadata and configuration.
File Structure
JSON object containing model configuration and metadata.
Example
Data Flow Summary
-
Feature Extraction (
extract_features.py)- Input: Raw
.mp4video files - Output:
labeled_embeddings.pt+unlabeled_embeddings.pt
- Input: Raw
-
Model Training (
train.py)- Input:
labeled_embeddings.pt - Output:
model.pt+model_config.json
- Input:
-
Prediction (
predict.py)- Input:
unlabeled_embeddings.pt+model.pt+model_config.json - Output:
predictions.json
- Input: