Overview
The Lead Scoring Model uses data from two separate CSV files that reflect different phases of the sales process. Data fusion combines these datasets into a single comprehensive dataset for analysis and model training.Source Datasets
leads.csv
Contains data about all potential clients who have shown interest:- Id: Unique identifier for the lead
- First Name: Lead’s first name
- Use Case: Type of use case for the potential client
- Source: Lead source (e.g., Inbound, Outbound)
- Status: Current status of the lead
- Discarded/Nurturing Reason: Reason for lead discard or nurturing
- Acquisition Campaign: Acquisition campaign that generated the lead
- Created Date: Lead creation date
- Converted: Whether the lead converted (1) or not (0)
- City: City of the lead
offers.csv
Contains data about clients who reached at least the demo meeting phase:- Id: Unique identifier for the offer
- Use Case: Type of use case for the offer
- Status: Current status of the offer (target variable)
- Created Date: Offer creation date
- Close Date: Offer closing date
- Price: Offer price
- Discount code: Applied discount code
- Pain: Customer potential’s pain level
- Loss Reason: Reason for offer loss
Fusion Process
Clean the leads dataset
Remove null values and redundant columns from leads data:
Columns like
Use Case, Created Date, Status, and Converted from leads.csv are dropped because they duplicate information already present in offers.csv.Merge the datasets
Perform a left join using the This creates a unified dataset that combines offer details with lead information.
Id column as the key:Resulting Dataset
After fusion, the combined dataset contains:- All offer-related features (Use Case, Status, dates, Price, Discount code, Pain, Loss Reason)
- Relevant lead information (Source, City)
- No duplicate columns or irrelevant identifiers
data/interim/full_dataset.csv for further preprocessing.
Next Steps
After data fusion, the dataset undergoes:- Data Cleaning - Handle missing values and duplicates
- Feature Engineering - Transform and create new features