Emotion Recognition

Overview

EmoChat uses computer vision techniques to detect and classify human emotions in real-time through facial analysis. The system captures facial expressions from a webcam feed, extracts facial landmarks, normalizes the features, and classifies them into emotion categories.

How It Works

The emotion recognition pipeline consists of four main stages:

Face Detection - Detect faces in the video frame
Landmark Extraction - Extract 68 facial landmark points
Feature Normalization - Normalize coordinates for consistent analysis
Emotion Classification - Predict emotion using the trained ML model

Facial Landmark Detection

OpenCV Implementation

EmoChat uses OpenCV’s face detection and landmark extraction capabilities, specifically:

Haar Cascade Classifier for face detection
LBF (Local Binary Features) Model for 68-point facial landmark detection

The core implementation is in utils.py:59:

def get_face_landmarks(image, draw: bool = False, static_image_mode: bool = True) -> List[float]:
    """
    Extrae landmarks faciales 2D usando únicamente OpenCV (sin Mediapipe).
    Utiliza un detector Haar + FacemarkLBF (68 puntos): devuelve una lista
    plana [x1_norm, y1_norm, x2_norm, y2_norm, ...].
    """

The system automatically downloads the required model files (haarcascade_frontalface_default.xml and lbfmodel.yaml) from OpenCV’s repository if they don’t exist locally.

68 Facial Points

The LBF model detects 68 specific facial landmark points that capture:

Jawline contour (17 points)
Eyebrow shapes (10 points)
Nose bridge and tip (9 points)
Eye contours (12 points)
Mouth outline (20 points)

These landmarks provide comprehensive facial geometry data for emotion analysis.

Feature Extraction Process

Step 1: Face Detection

The Haar Cascade classifier scans the grayscale image to detect faces:

face_detector, facemark = _get_models()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

Parameters:

scaleFactor=1.1 - Image pyramid scaling factor
minNeighbors=5 - Minimum neighbors required for face detection (reduces false positives)

Step 2: Landmark Fitting

Once a face is detected, the LBF model fits 68 landmarks to the facial region:

ok, landmarks = facemark.fit(gray, faces)
if not ok or len(landmarks) == 0:
    return []

# Only use the first detected face
points = landmarks[0][0]  # shape: (68, 2)

Step 3: Feature Normalization

Raw landmark coordinates vary based on face position and size. EmoChat normalizes these coordinates to make them position and scale-invariant:

xs = points[:, 0]
ys = points[:, 1]

# Calculate bounding box
min_x, min_y = xs.min(), ys.min()
max_x, max_y = xs.max(), ys.max()

width = float(max_x - min_x)
height = float(max_y - min_y)

# Normalize each point to [0, 1] range
features: List[float] = []
for (x, y) in zip(xs, ys):
    features.append(float((x - min_x) / width))
    features.append(float((y - min_y) / height))

Why Normalization? Normalizing coordinates makes the model invariant to:

Face size (distance from camera)
Face position (location in frame)
Image resolution

This ensures consistent predictions regardless of how the user positions themselves.

Feature Vector Output

The final feature vector contains 136 values (68 points × 2 coordinates):

[x1_norm, y1_norm, x2_norm, y2_norm, ..., x68_norm, y68_norm]

Each value is in the range [0, 1], representing normalized positions within the facial bounding box.

Currently Supported Emotions

EmoChat currently recognizes 2 core emotions:

Happy

Detected when facial features show:

Raised cheek muscles
Mouth corners elevated
Crow’s feet around eyes

Sad

Detected when facial features show:

Downturned mouth corners
Lowered eyebrows
Relaxed facial muscles

The emotion labels are defined in app.py:19 as:

emotions = ["HAPPY", "SAD"]

The model outputs an integer index (0 or 1) which maps to these labels.

Real-time Processing

Webcam Integration

The JavaScript frontend (main.js) captures frames from the webcam every 1 second:

// Start sending frames to backend every 1000ms
predictionInterval = setInterval(sendFrameForPrediction, 1000);

Frame Processing Flow

Capture - JavaScript captures frame from webcam video element
Encode - Frame is converted to JPEG and Base64 encoded
Send - Data is sent to Flask /predict endpoint via HTTP POST
Decode - Backend decodes Base64 to image array
Extract - get_face_landmarks() extracts normalized features
Predict - Model classifies the emotion
Return - Emotion label is sent back to frontend
Display - UI updates with detected emotion

# Backend processing (app.py:35)
@app.route('/predict', methods=['POST')
def predict():
    # Decode base64 image
    img_data = data['image'].split(',')[1]
    nparr = np.frombuffer(base64.b64decode(img_data), np.uint8)
    frame = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    
    # Extract features
    face_landmarks = get_face_landmarks(frame, draw=False, static_image_mode=True)
    
    # Predict emotion
    if len(face_landmarks) > 0:
        output = model.predict([face_landmarks])
        emotion = emotions[int(output[0])]
        return jsonify({'emotion': emotion})

Performance Consideration: Processing occurs at 1 FPS to balance responsiveness with computational efficiency. This rate provides smooth real-time feedback without overwhelming the CPU.

Error Handling

No Face Detected

When no face is found in the frame:

faces = face_detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
if len(faces) == 0:
    return []  # Empty feature vector

The API returns: {"emotion": "No face detected"}

Invalid Input

The function validates image format before processing:

if image is None or image.ndim != 3 or image.shape[2] != 3:
    return []  # Requires 3-channel RGB/BGR image

Visualization (Optional)

For debugging, landmarks can be drawn on the image:

face_landmarks = get_face_landmarks(frame, draw=True)

This draws green circles at each of the 68 landmark positions:

if draw:
    for (x, y) in points:
        cv2.circle(image, (int(x), int(y)), 1, (0, 255, 0), -1)

This feature is used in test_model.py for real-time visualization during development.

Get Started

Core Concepts

Training Guide

Web Application

Emotion Recognition

Overview

How It Works

Facial Landmark Detection

OpenCV Implementation

68 Facial Points

Feature Extraction Process

Step 1: Face Detection

Step 2: Landmark Fitting

Step 3: Feature Normalization

Feature Vector Output

Currently Supported Emotions

Happy

Sad

Real-time Processing

Webcam Integration

Frame Processing Flow

Error Handling

No Face Detected

Invalid Input

Visualization (Optional)

Next Steps

ML Model

Architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training Guide

Web Application

​Overview

​How It Works

​Facial Landmark Detection

​OpenCV Implementation

​68 Facial Points

​Feature Extraction Process

​Step 1: Face Detection

​Step 2: Landmark Fitting

​Step 3: Feature Normalization

​Feature Vector Output

​Currently Supported Emotions

Happy

Sad

​Real-time Processing

​Webcam Integration

​Frame Processing Flow

​Error Handling

​No Face Detected

​Invalid Input

​Visualization (Optional)

​Next Steps

ML Model

Architecture

Build docs developers (and LLMs) love

Overview

How It Works

Facial Landmark Detection

OpenCV Implementation

68 Facial Points

Feature Extraction Process

Step 1: Face Detection

Step 2: Landmark Fitting

Step 3: Feature Normalization

Feature Vector Output

Currently Supported Emotions

Real-time Processing

Webcam Integration

Frame Processing Flow

Error Handling

No Face Detected

Invalid Input

Visualization (Optional)

Next Steps