Skip to main content

Overview

The MLP class implements a simple multi-layer perceptron (feedforward neural network) used to classify TikTok videos based on their extracted embeddings. The model consists of two hidden layers with ReLU activations and dropout regularization. Defined in train.py:31-45 and predict.py:27-41.

Class Definition

class MLP(torch.nn.Module):
    def __init__(self, input_dim, num_classes, hidden_dim=256):
        # Architecture detailed below

Constructor Parameters

input_dim
int
required
The dimensionality of input features. For this project, this is typically 1024 (512-d CLIP visual + 512-d CLIP text embeddings concatenated).
num_classes
int
required
The number of output classes (TikTok folders/categories to predict). This corresponds to the number of folder categories in your labeled dataset.
hidden_dim
int
default:"256"
The size of the first hidden layer. The second hidden layer will be hidden_dim // 2 (integer division).Default: 256 (second layer becomes 128)

Architecture

The MLP uses a sequential architecture with the following layers:
Layer 1
Linear
Input: input_dimOutput: hidden_dimFirst linear transformation layer.
Layer 2
ReLU
Activation function introducing non-linearity.
Layer 3
Dropout(0.3)
Dropout regularization with 30% probability during training.
Layer 4
Linear
Input: hidden_dimOutput: hidden_dim // 2Second linear transformation layer (e.g., 256 → 128 with default hidden_dim).
Layer 5
ReLU
Second activation function.
Layer 6
Dropout(0.2)
Dropout regularization with 20% probability during training.
Layer 7
Linear
Input: hidden_dim // 2Output: num_classesFinal output layer producing logits for each class.

Forward Pass

def forward(self, x):
    return self.net(x)
x
torch.Tensor
Input tensor
returns
torch.Tensor
Output logits

Usage Example

Training

import torch
from train import MLP

# Initialize model
input_dim = 1024  # CLIP visual + text embeddings
num_classes = 5   # Number of TikTok folders
model = MLP(input_dim, num_classes, hidden_dim=256)

# Forward pass
batch = torch.randn(32, 1024)  # 32 videos
logits = model(batch)          # Shape: (32, 5)

Inference

import torch
import torch.nn.functional as F
from predict import MLP

# Load trained model
model = MLP(input_dim=1024, num_classes=5)
model.load_state_dict(torch.load("model.pt"))
model.eval()

# Predict
with torch.no_grad():
    features = torch.randn(10, 1024)  # 10 videos
    logits = model(features)
    probs = F.softmax(logits, dim=1)  # Convert to probabilities
    predictions = probs.argmax(dim=1)  # Get predicted class indices

Training Details

When training the MLP (see train.py:48-96):
  • Optimizer: Adam with learning rate 1e-3 and weight decay 1e-4
  • Loss function: CrossEntropyLoss with class weights to handle imbalanced datasets
  • Batch size: 32
  • Early stopping: Patience of 15 epochs based on validation accuracy
  • Device: Automatically uses CUDA if available, otherwise CPU

Model Persistence

The trained model is saved using PyTorch’s state dict:
# Save
torch.save(model.state_dict(), "artifacts/model.pt")

# Load
model = MLP(input_dim, num_classes, hidden_dim)
model.load_state_dict(torch.load("artifacts/model.pt"))
Model configuration is stored separately in model_config.json (see embeddings documentation).

Build docs developers (and LLMs) love