Skip to main content
The model training process takes the prepared facial landmark data and trains a Random Forest classifier to recognize emotions.

Overview

The train_model.py script loads the preprocessed data from data.txt, splits it into training and testing sets, trains a Random Forest classifier, evaluates its performance, and saves the trained model to disk.

Prerequisites

Before training the model, ensure you have:
Completed the Data Preparation step
Generated data.txt with sufficient training samples
At least two different emotion classes in your dataset

Training Workflow

1

Verify data.txt exists

Ensure data.txt is present in the source/ directory:
ls -lh source/data.txt
2

Run the training script

Execute the training script:
cd source
python train_model.py
3

Review training results

The script will output:
  • Accuracy percentage
  • Confusion matrix showing prediction performance
4

Verify model file

Check that the model file was created:
ls -lh source/model

How It Works

Loading Training Data

The script loads the preprocessed data from data.txt:
data_file = "data.txt"

if not os.path.isfile(data_file):
    raise FileNotFoundError(
        f"No se encontró '{data_file}'. Ejecuta primero 'prepare_data.py' para generarlo."
    )

data = np.loadtxt(data_file)

if data.ndim == 1:
    # Only one sample -> reshape to (1, n_features)
    data = data.reshape(1, -1)

Feature and Label Separation

The data is split into features (X) and labels (y):
# Separate into features (X) and labels (y)
X = data[:, :-1]  # All columns except the last (136 facial landmarks)
y = data[:, -1].astype(int)  # Last column (emotion label: 0 or 1)

Train/Test Split Configuration

The dataset is divided into training (80%) and testing (20%) sets:
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,        # 20% for testing
    random_state=42,      # Reproducible results
    shuffle=True,         # Randomize samples
    stratify=y,           # Maintain class proportions
)
stratify=y ensures both training and testing sets have proportional representation of each emotion class.

Random Forest Classifier Parameters

The model uses a Random Forest classifier with optimized parameters:
rf_classifier = RandomForestClassifier(
    n_estimators=200,     # 200 decision trees
    max_depth=None,       # No depth limit (trees grow until pure)
    n_jobs=-1,            # Use all CPU cores
    random_state=42,      # Reproducible results
)

Parameter Explanation

ParameterValuePurpose
n_estimators200Number of decision trees in the forest. More trees generally improve accuracy but increase training time.
max_depthNoneMaximum depth of each tree. None allows trees to expand until all leaves are pure.
n_jobs-1Number of parallel jobs. -1 uses all available CPU cores for faster training.
random_state42Seed for reproducibility. Ensures consistent results across runs.

Training the Model

The classifier is trained on the training set:
rf_classifier.fit(X_train, y_train)

Evaluation Metrics

Accuracy Score

The model’s accuracy is calculated on the test set:
y_pred = rf_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")
A typical output might look like:
Accuracy: 87.50%
Accuracy above 80% indicates good model performance. Below 70% suggests you may need more training data or better quality images.

Confusion Matrix

The confusion matrix shows how well the model distinguishes between emotions:
print(confusion_matrix(y_test, y_pred))
Example output:
[[12  2]
 [ 1 13]]
This means:
  • 12 happy images correctly classified as happy
  • 2 happy images incorrectly classified as sad
  • 1 sad image incorrectly classified as happy
  • 13 sad images correctly classified as sad

Saving the Model

The trained model is serialized and saved:
with open("./model", "wb") as f:
    pickle.dump(rf_classifier, f)
The model file can now be used by the testing script and the main application.

Understanding Results

Good Results

  • Accuracy: 80-95%
  • Confusion matrix shows high diagonal values (correct predictions)
  • Minimal off-diagonal values (misclassifications)

Poor Results

If accuracy is below 70% or the confusion matrix shows many errors:
Add more images to both emotion categories. Aim for at least 50-100 images per emotion.
Review your training images. Remove blurry, poorly lit, or obscured faces.
Ensure you have roughly equal numbers of images for each emotion.
Some emotions may be hard to distinguish. Ensure your training images have clear, distinct expressions.

Customizing Training Parameters

You can modify the Random Forest parameters to experiment with performance:
rf_classifier = RandomForestClassifier(
    n_estimators=500,     # More trees for better accuracy
    max_depth=None,
    n_jobs=-1,
    random_state=42,
)

Troubleshooting

”No se encontró ‘data.txt’”

Problem: The data file doesn’t exist. Solution: Run prepare_data.py first to generate the training data:
python prepare_data.py

“El archivo ‘data.txt’ no tiene suficiente número de columnas”

Problem: The data file is corrupted or empty. Solution: Delete data.txt and re-run prepare_data.py with valid training images.

”Se necesita al menos dos clases diferentes”

Problem: All training images are from the same emotion. Solution: Add images to both happy/ and sad/ folders, then re-run prepare_data.py.

Low Accuracy

Problem: Model accuracy is below 70%. Solution:
  1. Add more diverse training images
  2. Ensure images have clear, visible faces
  3. Balance the number of images per emotion
  4. Increase n_estimators to 300-500

Training Takes Too Long

Problem: Training is very slow. Solution:
  • Reduce n_estimators to 100-150
  • Set max_depth=15 to limit tree growth
  • Ensure n_jobs=-1 is set to use all CPU cores

Next Steps

Once you’ve successfully trained the model with satisfactory accuracy, proceed to Testing to verify real-time emotion recognition.

Build docs developers (and LLMs) love