Overview
Feature engineering transforms raw data into meaningful features that improve model performance. This process includes date extraction, categorical encoding, and feature scaling.Date Feature Extraction
Converting to DateTime
First, date columns are converted from strings to datetime objects:Extracting Year and Month
Temporal features are extracted from the datetime columns:Created Year: Year when the offer was createdCreated Month: Month when the offer was createdClose Year: Year when the offer was closedClose Month: Month when the offer was closed
Extracting year and month as separate features allows the model to capture seasonal patterns and temporal trends in lead conversion.
Dropping Original Date Columns
After extraction, the original datetime columns are removed:Target Variable Mapping
Status Column Transformation
TheStatus column contains the target variable. Minority classes are grouped to address class imbalance:
- Closed Won: Successfully converted leads
- Closed Lost: Lost opportunities
- Other: All other statuses (e.g., In Progress, Nurturing)
Label Encoding
Why Label Encoding?
Machine learning models require numerical inputs. Label Encoding converts categorical variables into integer representations.Implementation
Encoded Features
The following categorical features are encoded:Source(Inbound, Outbound, etc.)City(Various cities)Loss Reason(Reasons for lost opportunities)Pain(Customer pain level)Discount code(Applied discount codes)Status(Closed Won, Closed Lost, Other)Use Case(Type of use case)
Each unique category is assigned a unique integer. For example, if
Source has values [“Inbound”, “Outbound”, “Referral”], they might be encoded as [0, 1, 2].Data Scaling
After encoding, numerical features are scaled using StandardScaler to normalize their ranges:StandardScaler transforms features to have zero mean and unit variance, which helps gradient boosting algorithms converge faster and perform better.
Final Output
The fully preprocessed dataset is saved for model training:Summary
The feature engineering pipeline:- Extracts temporal features (year, month) from dates
- Maps target variable to three balanced classes
- Encodes all categorical variables using LabelEncoder
- Scales numerical features using StandardScaler
- Outputs a clean, model-ready dataset