Overview
TheMLP class implements a simple multi-layer perceptron (feedforward neural network) used to classify TikTok videos based on their extracted embeddings. The model consists of two hidden layers with ReLU activations and dropout regularization.
Defined in train.py:31-45 and predict.py:27-41.
Class Definition
Constructor Parameters
The dimensionality of input features. For this project, this is typically 1024 (512-d CLIP visual + 512-d CLIP text embeddings concatenated).
The number of output classes (TikTok folders/categories to predict). This corresponds to the number of folder categories in your labeled dataset.
The size of the first hidden layer. The second hidden layer will be
hidden_dim // 2 (integer division).Default: 256 (second layer becomes 128)Architecture
The MLP uses a sequential architecture with the following layers:Input:
input_dim → Output: hidden_dimFirst linear transformation layer.Activation function introducing non-linearity.
Dropout regularization with 30% probability during training.
Input:
hidden_dim → Output: hidden_dim // 2Second linear transformation layer (e.g., 256 → 128 with default hidden_dim).Second activation function.
Dropout regularization with 20% probability during training.
Input:
hidden_dim // 2 → Output: num_classesFinal output layer producing logits for each class.Forward Pass
Input tensor
Output logits
Usage Example
Training
Inference
Training Details
When training the MLP (seetrain.py:48-96):
- Optimizer: Adam with learning rate 1e-3 and weight decay 1e-4
- Loss function: CrossEntropyLoss with class weights to handle imbalanced datasets
- Batch size: 32
- Early stopping: Patience of 15 epochs based on validation accuracy
- Device: Automatically uses CUDA if available, otherwise CPU
Model Persistence
The trained model is saved using PyTorch’s state dict:model_config.json (see embeddings documentation).