Making Predictions - Lead Scoring Model

Once the model is trained, you can generate predictions for new leads to estimate their conversion probability. The prediction process outputs both class labels and probability scores.

Prediction Workflow

Load Test Data

The model uses the test set split from the processed dataset to generate predictions.

# From train_model.py:70-89
def get_training_data(self):
    # Read processed dataset
    data = pd.read_csv("data/processed/full_dataset.csv")

    # Split data
    class_label = 'Status'
    X = data.drop([class_label], axis=1)
    y = data[class_label]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, random_state=42, shuffle=True, test_size=0.2
    )

    return X_train, X_test, y_train, y_test

Generate Predictions

The best performing model (Gradient Boosting) generates both class predictions and probability scores.

# From train_model.py:175-181
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)

# Probabilities
y_probabilities = best_model.predict_proba(X_test)
y_predicted = np.argmax(y_probabilities, axis=1)
predictions_df, probability_distribution = self.get_lead_distribution(
    X_test, y_predicted, y_probabilities
)

Map Predictions to Classes

The numeric predictions are mapped to human-readable class labels.

# From train_model.py:129-133
# Diccionario de mapeo
mapping = {0: 'Closed Lost', 1: 'Closed Won', 2: 'Other'}

# Aplicar el mapeo a y_predicted
y_predicted_mapped = np.vectorize(mapping.get)(y_predicted)

Create Results DataFrame

The predictions are combined with original lead features into a comprehensive results table.

# From train_model.py:136-145
impact_df = pd.DataFrame({
    'Observation': range(1, len(X_test_original) + 1),
    'Use Case': X_test_original['Use Case'],
    'Discount code': X_test_original['Discount code'],
    'Loss Reason': X_test_original['Loss Reason'],
    'Source': X_test_original['Source'],
    'City': X_test_original['City'],
    'Predicted Class': y_predicted_mapped,
    'Probability Closed-Won': y_probabilities[:, 0],
})

Running Predictions

To generate predictions, execute the main training pipeline:

# From train_model.py:195-197
if __name__ == "__main__":
    trainer = ModelTraining()
    trainer.run()

The run() method returns a dictionary containing:

predictions_df: DataFrame with predicted classes and features
probability_distribution: Distribution of probabilities across bins
accuracy_score: Model accuracy on the test set

The model uses predict_proba() to generate probability scores for each class, then selects the class with the highest probability using np.argmax().

Output Format

Each prediction includes:

Observation: Sequential identifier for each lead
Lead Features: Use Case, Discount code, Loss Reason, Source, City
Predicted Class: One of three categories (Closed Won, Closed Lost, Other)
Probability Score: Confidence level for the Closed-Won prediction

The probability score specifically represents the likelihood of “Closed Won” outcome, which is the primary metric for lead prioritization.

Training

Prediction

​Prediction Workflow

​Running Predictions

​Output Format

Build docs developers (and LLMs) love

Prediction Workflow

Running Predictions

Output Format