Overview
EmoChat uses a Random Forest Classifier to predict emotions from normalized facial landmarks. The model is trained on facial feature data extracted from labeled emotion images and achieves high accuracy through ensemble learning.Random Forest Classifier
What is Random Forest?
Random Forest is an ensemble machine learning algorithm that:- Creates multiple decision trees during training
- Each tree votes on the predicted class
- Final prediction is determined by majority vote
- Reduces overfitting compared to single decision trees
- Handles high-dimensional feature spaces well
Why Random Forest for Emotion Detection?Random Forest is ideal for this task because:
- Robust to noise in facial landmark detection
- Handles the 136-dimensional feature space efficiently
- Requires less data than deep learning approaches
- Fast inference suitable for real-time predictions
- Interpretable feature importance
Model Configuration
Hyperparameters
The model is configured intrain_model.py:56 with the following parameters:
Number of decision trees in the forest. More trees generally improve accuracy but increase training time and model size.
Maximum depth of each tree.
None means nodes expand until all leaves are pure or contain fewer than min_samples_split samples. This allows trees to fully capture patterns in the data.Number of CPU cores to use for training.
-1 uses all available cores for parallel processing.Seed for random number generation. Ensures reproducible results across training runs.
Training Data Structure
Data Format
Training data is stored indata.txt as a NumPy array with shape (n_samples, 137):
- Columns 0-135: Normalized facial landmark coordinates (136 features)
- Column 136: Emotion label (integer)
0= HAPPY1= SAD
Data Preparation
Theprepare_data.py script processes raw images to create training data:
The script processes images from folder structure:
Training Process
Data Loading
The training script loads preprocessed data:Train/Test Split Strategy
Data is split into training and testing sets using stratified sampling:20% of data reserved for testing, 80% used for training
Ensures consistent split across runs for reproducibility
Randomly shuffles data before splitting to avoid ordering bias
Ensures both train and test sets have the same proportion of each emotion class. Critical for balanced evaluation.
Why Stratified Split?If you have 100 happy samples and 20 sad samples:
- Without stratification: Test set might get 0 sad samples by chance
- With stratification: Test set gets ~16 happy and ~4 sad samples (same 80/20 ratio)
Model Training
Training is straightforward with scikit-learn:- Creates 200 decision trees
- Each tree is trained on a random bootstrap sample of the data
- Each split considers a random subset of features
- Trees are grown to maximum depth
- All trees are stored in the ensemble
Model Evaluation
Accuracy Metric
The model is evaluated on the held-out test set:Confusion Matrix
The training script also prints a confusion matrix:Making Predictions
Model Loading
The trained model is serialized using pickle and loaded at runtime:Inference
Predictions are made by passing normalized facial landmarks:predict() method:
- Passes features through all 200 decision trees
- Each tree votes for a class (0 or 1)
- Returns the majority vote as the prediction
Prediction Output
Model Performance Considerations
Inference Speed
Random Forest provides fast predictions:- Single prediction: < 1ms on modern CPUs
- Suitable for real-time video processing at 1 FPS
- Parallel tree evaluation when using multiple cores
Memory Footprint
Model size depends on:- Number of trees (200)
- Average tree depth (depends on data)
- Number of features (136)
Training Requirements
Minimum Data
At least 2 classes with multiple samples each. More data improves accuracy.
Training Time
Typically 1-10 seconds depending on dataset size and CPU cores available.
Model Limitations
Extending the Model
To add more emotion classes:-
Add training data in
data/folder: -
Update allowed emotions in
prepare_data.py:25: -
Update emotion labels in
app.py:19: -
Retrain the model:
Emotion folders are processed in alphabetical order, which determines the integer labels. Keep this consistent across preparation and prediction.
Next Steps
Emotion Recognition
Learn how facial landmarks are detected and extracted
Architecture
Understand the complete system architecture

