System Requirements
Before installing the Lead Scoring Model, ensure your system meets these requirements:- Python: Version 3.7 or higher
- pip: Latest version recommended
- Operating System: Linux, macOS, or Windows
- Memory: At least 4GB RAM for model training
- Storage: 500MB for dependencies and data
Clone the Repository
First, clone the project repository to your local machine:Install Dependencies
The project uses several Python libraries for data processing, machine learning, and visualization.Create a virtual environment (recommended)
Create and activate a virtual environment to isolate project dependencies:
Install required packages
Install all dependencies from This installs the following packages:
requirements.txt:- matplotlib (3.7.2) - Data visualization
- missingno (0.5.2) - Missing data visualization
- numpy (1.23.5) - Numerical computing
- pandas (2.1.4) - Data manipulation and analysis
- scikit-learn (1.3.0) - Machine learning algorithms
- seaborn (0.13.0) - Statistical data visualization
- shimoku-api-python (1.4.1) - Dashboard integration
The scikit-learn package provides all 12 classification algorithms used in model comparison, including Gradient Boosting, Random Forest, SVM, and Neural Networks.
Project Structure
After installation, your project directory should look like this:Prepare Your Data
The model requires two CSV files to be placed in thedata/raw/ directory:
Add leads.csv
This file should contain lead information with these columns:
- Id: Unique identifier for the lead
- First Name: Lead’s first name
- Use Case: Type of use case for the potential client
- Source: Lead source (e.g., Inbound, Outbound)
- Status: Current status of the lead
- Discarded/Nurturing Reason: Reason for lead discard or nurturing
- Acquisition Campaign: Acquisition campaign that generated the lead
- Created Date: Lead creation date
- Converted: Target variable (1 = converted, 0 = not converted)
- City: City of the lead
Add offers.csv
This file should contain offer information with these columns:
- Id: Unique identifier for the offer
- Use Case: Type of use case for the offer
- Status: Current status of the offer (Closed Won, Closed Lost, etc.)
- Created Date: Offer creation date
- Close Date: Offer closing date
- Price: Offer price
- Discount code: Applied discount code
- Pain: Customer potential’s pain level
- Loss Reason: Reason for offer loss
Optional: Shimoku Dashboard Setup
If you want to use the dashboard visualization features, configure your Shimoku API credentials:Verify Installation
Test that everything is set up correctly:Troubleshooting
ImportError: No module named 'sklearn'
ImportError: No module named 'sklearn'
This means scikit-learn wasn’t installed correctly. Try:
Version conflicts with numpy
Version conflicts with numpy
If you encounter version conflicts, install the exact versions specified:
Permission denied errors
Permission denied errors
On macOS/Linux, you may need to use
pip3 instead of pip:Next Steps
Quickstart Guide
Train your first model and make predictions
Data Preprocessing
Learn about the data preparation pipeline