Skip to main content

System Requirements

Before installing the Lead Scoring Model, ensure your system meets these requirements:
  • Python: Version 3.7 or higher
  • pip: Latest version recommended
  • Operating System: Linux, macOS, or Windows
  • Memory: At least 4GB RAM for model training
  • Storage: 500MB for dependencies and data

Clone the Repository

First, clone the project repository to your local machine:
git clone <repository-url>
cd lead-scoring-model

Install Dependencies

The project uses several Python libraries for data processing, machine learning, and visualization.
1

Create a virtual environment (recommended)

Create and activate a virtual environment to isolate project dependencies:
python3 -m venv venv
source venv/bin/activate
2

Install required packages

Install all dependencies from requirements.txt:
pip install -r requirements.txt
This installs the following packages:
  • matplotlib (3.7.2) - Data visualization
  • missingno (0.5.2) - Missing data visualization
  • numpy (1.23.5) - Numerical computing
  • pandas (2.1.4) - Data manipulation and analysis
  • scikit-learn (1.3.0) - Machine learning algorithms
  • seaborn (0.13.0) - Statistical data visualization
  • shimoku-api-python (1.4.1) - Dashboard integration
3

Verify installation

Confirm all packages installed correctly:
pip list
You should see all the packages listed above in the output.
The scikit-learn package provides all 12 classification algorithms used in model comparison, including Gradient Boosting, Random Forest, SVM, and Neural Networks.

Project Structure

After installation, your project directory should look like this:
.
├── README.md
├── data
│   ├── interim          # Intermediate processing files
│   ├── processed        # Final processed datasets
│   └── raw              # Original leads.csv and offers.csv
├── models               # Saved model artifacts
├── notebooks
│   └── data_preprocessing.ipynb
├── reports
│   ├── leads_report.html
│   ├── model_training.log
│   └── offers_report.html
├── requirements.txt
└── src
    ├── app.py           # Shimoku dashboard application
    ├── data
    │   ├── data_preprocessing.py
    │   └── data_cleaning.py
    ├── models
    │   ├── __init__.py
    │   └── train_model.py
    └── utils
        └── logger.py

Prepare Your Data

The model requires two CSV files to be placed in the data/raw/ directory:
1

Add leads.csv

This file should contain lead information with these columns:
  • Id: Unique identifier for the lead
  • First Name: Lead’s first name
  • Use Case: Type of use case for the potential client
  • Source: Lead source (e.g., Inbound, Outbound)
  • Status: Current status of the lead
  • Discarded/Nurturing Reason: Reason for lead discard or nurturing
  • Acquisition Campaign: Acquisition campaign that generated the lead
  • Created Date: Lead creation date
  • Converted: Target variable (1 = converted, 0 = not converted)
  • City: City of the lead
2

Add offers.csv

This file should contain offer information with these columns:
  • Id: Unique identifier for the offer
  • Use Case: Type of use case for the offer
  • Status: Current status of the offer (Closed Won, Closed Lost, etc.)
  • Created Date: Offer creation date
  • Close Date: Offer closing date
  • Price: Offer price
  • Discount code: Applied discount code
  • Pain: Customer potential’s pain level
  • Loss Reason: Reason for offer loss
Ensure both CSV files contain the Id column for proper data fusion. The preprocessing script will merge these datasets using this unique identifier.

Optional: Shimoku Dashboard Setup

If you want to use the dashboard visualization features, configure your Shimoku API credentials:
export SHIMOKU_TOKEN="your_access_token"
export SHIMOKU_UNIVERSE_ID="your_universe_id"
export SHIMOKU_WORKSPACE_ID="your_workspace_id"
You can obtain these credentials from your Shimoku account settings.

Verify Installation

Test that everything is set up correctly:
python3 -c "import pandas, sklearn, numpy; print('Installation successful!')"
If you see “Installation successful!”, you’re ready to proceed to the quickstart guide.

Troubleshooting

This means scikit-learn wasn’t installed correctly. Try:
pip install --upgrade scikit-learn==1.3.0
If you encounter version conflicts, install the exact versions specified:
pip install numpy==1.23.5 --force-reinstall
On macOS/Linux, you may need to use pip3 instead of pip:
pip3 install -r requirements.txt

Next Steps

Quickstart Guide

Train your first model and make predictions

Data Preprocessing

Learn about the data preparation pipeline

Build docs developers (and LLMs) love