Overview
Feature engineering transforms raw data into meaningful predictors. This module creates three engineered features: engagement score, exam success rate, and learning consistency.FeatureConfig
Configuration dataclass for feature engineering parameters. Implementation:src/features.py:9
Configuration Values
Defined inconfig.yaml:
Core Function
add_engineered_features()
Creates three derived features from raw data. Implementation:src/features.py:17
Engineered Features
1. Engagement Score
Weighted combination of user activity metrics. Formula:- User with 100 minutes watched, 50 days on platform, and 3 courses started:
engagement_score = (100 × 0.6) + (50 × 0.3) + (3 × 10.0) = 60 + 15 + 30 = 105
2. Exam Success Rate
Ratio of passed exams to started exams with epsilon smoothing. Formula:1.0e-06 prevents numerical instability
Example:
- User passed 4 out of 5 exams:
4 / (5 + 0.000001) ≈ 0.8 - User with no exams:
0.0
3. Learning Consistency
Average minutes watched per day on platform. Formula:- 300 minutes over 30 days:
300 / 30 = 10 minutes/day - 300 minutes over 3 days:
300 / 3 = 100 minutes/day
Feature Importance
These engineered features often outperform raw features:- engagement_score: Combines multiple signals into single metric
- exam_success_rate: Strong predictor of purchase intent
- learning_consistency: Distinguishes committed learners from browsers
IQRClipper Transformer
Custom scikit-learn transformer for outlier clipping. Implementation:src/features.py:40
IQR Method
- Q1: 25th percentile
- Q3: 75th percentile
- IQR: Q3 - Q1
- Bounds: [Q1 - 1.5×IQR, Q3 + 1.5×IQR]
config.yaml:
Usage Example
Related Modules
- Data loading:
src/data.py:26callsadd_engineered_features() - Preprocessing: Feature transformations applied in training pipeline
- Model training: Uses engineered features for predictions
Next Steps
Data Loading
Learn how raw data is loaded and split
Model Selection
See how features are used in model training