Overview
Theprepare_data.py script processes images organized in emotion-specific folders, extracts facial landmark coordinates using OpenCV, and saves the normalized data to data.txt for training.
Directory Structure
Organize your training images in the following structure:The
data/ directory should be located at the root of your project, parallel to the source/ folder.Supported Emotions
Currently, EmoChat processes only two emotions:- happy (label: 0)
- sad (label: 1)
Data Preparation Workflow
Create the data directory
Create a
data/ folder at your project root with happy/ and sad/ subfolders:Add training images
Place facial images into the respective emotion folders. Ensure:
- Images contain clearly visible faces
- Good lighting conditions
- Various angles and expressions
- At least 20-30 images per emotion for better results
How It Works
Loading Emotion Folders
The script identifies valid emotion directories:Extracting Facial Landmarks
For each image, the script:- Reads the image using OpenCV
- Extracts facial landmarks (68 points × 2 coordinates = 136 features)
- Normalizes the coordinates
- Associates the data with an emotion label
Generating data.txt
The extracted features are saved to a text file:data.txt contains:
- 136 facial landmark coordinates (features)
- 1 emotion label (0 for happy, 1 for sad)
Troubleshooting
”No se encontró el directorio de datos”
Problem: Thedata/ directory doesn’t exist.
Solution: Create the directory structure:
“No se encontraron carpetas ‘happy’ o ‘sad’”
Problem: The emotion folders are missing or misnamed. Solution: Ensure folders are named exactlyhappy and sad (case-insensitive).
”No se generaron muestras”
Problem: No faces were detected in the images. Solution:- Verify images contain visible faces
- Ensure images are not corrupted
- Try images with better lighting and frontal faces
- Check that OpenCV can read your image formats
Low Sample Count
Problem: Only a few images were processed successfully. Solution:- Add more high-quality training images
- Ensure faces are clearly visible and well-lit
- Remove blurry or low-resolution images
Best Practices
Image Quality
Use clear, well-lit images with visible facial features for better landmark detection.
Dataset Size
Aim for at least 50-100 images per emotion for reliable model performance.
Diversity
Include multiple people, angles, and lighting conditions to improve generalization.
Balance
Keep roughly equal numbers of images for each emotion to avoid bias.
Next Steps
Once you’ve successfully generateddata.txt, proceed to Model Training to train your emotion recognition classifier.
