Model Training

The model training process takes the prepared facial landmark data and trains a Random Forest classifier to recognize emotions.

Overview

The train_model.py script loads the preprocessed data from data.txt, splits it into training and testing sets, trains a Random Forest classifier, evaluates its performance, and saves the trained model to disk.

Prerequisites

Before training the model, ensure you have:

Completed the Data Preparation step

Generated data.txt with sufficient training samples

At least two different emotion classes in your dataset

Training Workflow

Verify data.txt exists

Ensure data.txt is present in the source/ directory:

ls -lh source/data.txt

Run the training script

Execute the training script:

cd source
python train_model.py

Review training results

The script will output:

Accuracy percentage
Confusion matrix showing prediction performance

Verify model file

Check that the model file was created:

ls -lh source/model

How It Works

Loading Training Data

The script loads the preprocessed data from data.txt:

data_file = "data.txt"

if not os.path.isfile(data_file):
    raise FileNotFoundError(
        f"No se encontró '{data_file}'. Ejecuta primero 'prepare_data.py' para generarlo."
    )

data = np.loadtxt(data_file)

if data.ndim == 1:
    # Only one sample -> reshape to (1, n_features)
    data = data.reshape(1, -1)

Feature and Label Separation

The data is split into features (X) and labels (y):

# Separate into features (X) and labels (y)
X = data[:, :-1]  # All columns except the last (136 facial landmarks)
y = data[:, -1].astype(int)  # Last column (emotion label: 0 or 1)

Train/Test Split Configuration

The dataset is divided into training (80%) and testing (20%) sets:

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,        # 20% for testing
    random_state=42,      # Reproducible results
    shuffle=True,         # Randomize samples
    stratify=y,           # Maintain class proportions
)

stratify=y ensures both training and testing sets have proportional representation of each emotion class.

Random Forest Classifier Parameters

The model uses a Random Forest classifier with optimized parameters:

rf_classifier = RandomForestClassifier(
    n_estimators=200,     # 200 decision trees
    max_depth=None,       # No depth limit (trees grow until pure)
    n_jobs=-1,            # Use all CPU cores
    random_state=42,      # Reproducible results
)

Parameter Explanation

Parameter	Value	Purpose
`n_estimators`	200	Number of decision trees in the forest. More trees generally improve accuracy but increase training time.
`max_depth`	None	Maximum depth of each tree. `None` allows trees to expand until all leaves are pure.
`n_jobs`	-1	Number of parallel jobs. `-1` uses all available CPU cores for faster training.
`random_state`	42	Seed for reproducibility. Ensures consistent results across runs.

Training the Model

The classifier is trained on the training set:

rf_classifier.fit(X_train, y_train)

Evaluation Metrics

Accuracy Score

The model’s accuracy is calculated on the test set:

y_pred = rf_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

A typical output might look like:

Accuracy: 87.50%

Accuracy above 80% indicates good model performance. Below 70% suggests you may need more training data or better quality images.

Confusion Matrix

The confusion matrix shows how well the model distinguishes between emotions:

print(confusion_matrix(y_test, y_pred))

Example output:

[[12  2]
 [ 1 13]]

This means:

12 happy images correctly classified as happy
2 happy images incorrectly classified as sad
1 sad image incorrectly classified as happy
13 sad images correctly classified as sad

Saving the Model

The trained model is serialized and saved:

with open("./model", "wb") as f:
    pickle.dump(rf_classifier, f)

The model file can now be used by the testing script and the main application.

Understanding Results

Good Results

Accuracy: 80-95%
Confusion matrix shows high diagonal values (correct predictions)
Minimal off-diagonal values (misclassifications)

Poor Results

If accuracy is below 70% or the confusion matrix shows many errors:

Insufficient Training Data

Add more images to both emotion categories. Aim for at least 50-100 images per emotion.

Poor Image Quality

Review your training images. Remove blurry, poorly lit, or obscured faces.

Imbalanced Dataset

Ensure you have roughly equal numbers of images for each emotion.

Similar Expressions

Some emotions may be hard to distinguish. Ensure your training images have clear, distinct expressions.

Customizing Training Parameters

You can modify the Random Forest parameters to experiment with performance:

rf_classifier = RandomForestClassifier(
    n_estimators=500,     # More trees for better accuracy
    max_depth=None,
    n_jobs=-1,
    random_state=42,
)

Troubleshooting

”No se encontró ‘data.txt’”

Problem: The data file doesn’t exist. Solution: Run prepare_data.py first to generate the training data:

python prepare_data.py

“El archivo ‘data.txt’ no tiene suficiente número de columnas”

Problem: The data file is corrupted or empty. Solution: Delete data.txt and re-run prepare_data.py with valid training images.

”Se necesita al menos dos clases diferentes”

Problem: All training images are from the same emotion. Solution: Add images to both happy/ and sad/ folders, then re-run prepare_data.py.

Low Accuracy

Problem: Model accuracy is below 70%. Solution:

Add more diverse training images
Ensure images have clear, visible faces
Balance the number of images per emotion
Increase n_estimators to 300-500

Training Takes Too Long

Problem: Training is very slow. Solution:

Reduce n_estimators to 100-150
Set max_depth=15 to limit tree growth
Ensure n_jobs=-1 is set to use all CPU cores

Next Steps

Once you’ve successfully trained the model with satisfactory accuracy, proceed to Testing to verify real-time emotion recognition.

Get Started

Core Concepts

Training Guide

Web Application

Overview

Prerequisites

Training Workflow

How It Works

Loading Training Data

Feature and Label Separation

Train/Test Split Configuration

Random Forest Classifier Parameters

Parameter Explanation

Training the Model

Evaluation Metrics

Accuracy Score

Confusion Matrix

Saving the Model

Understanding Results

Good Results

Poor Results

Customizing Training Parameters

Troubleshooting

”No se encontró ‘data.txt’”

“El archivo ‘data.txt’ no tiene suficiente número de columnas”

”Se necesita al menos dos clases diferentes”

Low Accuracy

Training Takes Too Long

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training Guide

Web Application

​Overview

​Prerequisites

​Training Workflow

​How It Works

​Loading Training Data

​Feature and Label Separation

​Train/Test Split Configuration

​Random Forest Classifier Parameters

​Parameter Explanation

​Training the Model

​Evaluation Metrics

​Accuracy Score

​Confusion Matrix

​Saving the Model

​Understanding Results

​Good Results

​Poor Results

​Customizing Training Parameters

​Troubleshooting

​”No se encontró ‘data.txt’”

​“El archivo ‘data.txt’ no tiene suficiente número de columnas”

​”Se necesita al menos dos clases diferentes”

​Low Accuracy

​Training Takes Too Long

​Next Steps

Build docs developers (and LLMs) love

Overview

Prerequisites

Training Workflow

How It Works

Loading Training Data

Feature and Label Separation

Train/Test Split Configuration

Random Forest Classifier Parameters

Parameter Explanation

Training the Model

Evaluation Metrics

Accuracy Score

Confusion Matrix

Saving the Model

Understanding Results

Good Results

Poor Results

Customizing Training Parameters

Troubleshooting

”No se encontró ‘data.txt’”

“El archivo ‘data.txt’ no tiene suficiente número de columnas”

”Se necesita al menos dos clases diferentes”

Low Accuracy

Training Takes Too Long

Next Steps