Skip to main content
This guide walks you through training the text-based emotion prediction models using the Jupyter notebooks included in the project.

Overview

The project includes two primary notebooks for model training:
  • emotion_prediction.ipynb - Trains the basic emotion classifier
  • labels_clasifier__textbased_emotion_prediction.ipynb - Trains the multi-label toxicity classifier

Prerequisites

Before training models, ensure you have:
  • Python 3.x installed
  • Virtual environment set up
  • All dependencies installed from requirements.txt
1
Create Virtual Environment
2
Create a new virtual environment for the project:
3
python3 -m venv "text_prediction"
4
Install Dependencies
5
Install all required packages:
6
pip3 install -r requirements.txt
8
Access the training notebooks in the notebooks/ directory:
9
cd notebooks/
jupyter notebook
10
Train Emotion Classifier
11
Open and run emotion_prediction.ipynb to train the basic emotion classification model. This notebook creates the emotion_classifier.model file.
12
Train Multi-Label Toxicity Classifier
13
Open and run labels_clasifier__textbased_emotion_prediction.ipynb to train the neural network model that classifies text into 6 toxicity categories:
14
  • Toxic
  • Severe Toxic
  • Obscene
  • Threat
  • Insult
  • Identity Hate
  • 15
    This notebook produces the PyTorch model weights file model_26_87.12.pth.

    Model Artifacts

    After training, the following files are generated in the models/ directory:
    FileDescription
    emotion_classifier.modelPickle file containing the trained emotion classifier
    model_26_87.12.pthPyTorch weights for the multi-label toxicity classifier (87.12% accuracy)
    vectorizer2.pickleText vectorizer for feature extraction
    word_dict.jsonWord-to-index mapping for embeddings
    train_tensor.ptTraining data tensors
    train_labels.ptTraining labels
    test_tensor.ptTest data tensors
    test_labels.ptTest labels

    Model Architecture

    The multi-label classifier uses a neural network with the following structure:
    class base_line(nn.Module):
      def __init__(self, fin, out):
        super(base_line, self).__init__()
        self.fc1 = nn.Linear(self.fin, 2048)
        self.fc2 = nn.Linear(2048, 1024)
        self.fc3 = nn.Linear(1024, 512)
        self.relu = nn.ReLU()
        self.fc4 = nn.Linear(512, self.out)
        self.sigmoid = nn.Sigmoid()
    
    The model uses:
    • 10-dimensional word embeddings
    • 4 fully connected layers (2048 → 1024 → 512 → 6)
    • ReLU activation
    • Sigmoid output for multi-label classification

    Next Steps

    After training your models, learn how to deploy the Flask application to start making predictions.

    Build docs developers (and LLMs) love