Overview
Thetrain_model.py script loads the preprocessed data from data.txt, splits it into training and testing sets, trains a Random Forest classifier, evaluates its performance, and saves the trained model to disk.
Prerequisites
Before training the model, ensure you have:Completed the Data Preparation step
Generated
data.txt with sufficient training samplesAt least two different emotion classes in your dataset
Training Workflow
Review training results
The script will output:
- Accuracy percentage
- Confusion matrix showing prediction performance
How It Works
Loading Training Data
The script loads the preprocessed data fromdata.txt:
Feature and Label Separation
The data is split into features (X) and labels (y):Train/Test Split Configuration
The dataset is divided into training (80%) and testing (20%) sets:stratify=y ensures both training and testing sets have proportional representation of each emotion class.Random Forest Classifier Parameters
The model uses a Random Forest classifier with optimized parameters:Parameter Explanation
| Parameter | Value | Purpose |
|---|---|---|
n_estimators | 200 | Number of decision trees in the forest. More trees generally improve accuracy but increase training time. |
max_depth | None | Maximum depth of each tree. None allows trees to expand until all leaves are pure. |
n_jobs | -1 | Number of parallel jobs. -1 uses all available CPU cores for faster training. |
random_state | 42 | Seed for reproducibility. Ensures consistent results across runs. |
Training the Model
The classifier is trained on the training set:Evaluation Metrics
Accuracy Score
The model’s accuracy is calculated on the test set:Accuracy above 80% indicates good model performance. Below 70% suggests you may need more training data or better quality images.
Confusion Matrix
The confusion matrix shows how well the model distinguishes between emotions:- 12 happy images correctly classified as happy
- 2 happy images incorrectly classified as sad
- 1 sad image incorrectly classified as happy
- 13 sad images correctly classified as sad
Saving the Model
The trained model is serialized and saved:model file can now be used by the testing script and the main application.
Understanding Results
Good Results
- Accuracy: 80-95%
- Confusion matrix shows high diagonal values (correct predictions)
- Minimal off-diagonal values (misclassifications)
Poor Results
If accuracy is below 70% or the confusion matrix shows many errors:Insufficient Training Data
Insufficient Training Data
Add more images to both emotion categories. Aim for at least 50-100 images per emotion.
Poor Image Quality
Poor Image Quality
Review your training images. Remove blurry, poorly lit, or obscured faces.
Imbalanced Dataset
Imbalanced Dataset
Ensure you have roughly equal numbers of images for each emotion.
Similar Expressions
Similar Expressions
Some emotions may be hard to distinguish. Ensure your training images have clear, distinct expressions.
Customizing Training Parameters
You can modify the Random Forest parameters to experiment with performance:Troubleshooting
”No se encontró ‘data.txt’”
Problem: The data file doesn’t exist. Solution: Runprepare_data.py first to generate the training data:
“El archivo ‘data.txt’ no tiene suficiente número de columnas”
Problem: The data file is corrupted or empty. Solution: Deletedata.txt and re-run prepare_data.py with valid training images.
”Se necesita al menos dos clases diferentes”
Problem: All training images are from the same emotion. Solution: Add images to bothhappy/ and sad/ folders, then re-run prepare_data.py.
Low Accuracy
Problem: Model accuracy is below 70%. Solution:- Add more diverse training images
- Ensure images have clear, visible faces
- Balance the number of images per emotion
- Increase
n_estimatorsto 300-500
Training Takes Too Long
Problem: Training is very slow. Solution:- Reduce
n_estimatorsto 100-150 - Set
max_depth=15to limit tree growth - Ensure
n_jobs=-1is set to use all CPU cores

