Skip to main content

Overview

The Hospital Data Analysis Platform includes a custom logistic regression implementation for predicting patient risk factors and clinical outcomes. The SimpleLogisticModel class provides a lightweight, interpretable approach to binary classification.

SimpleLogisticModel Class

The SimpleLogisticModel implements logistic regression using gradient descent optimization.

Initialization

from modeling.predictive import SimpleLogisticModel

model = SimpleLogisticModel(lr=0.01, epochs=600)
Parameters:
  • lr (float): Learning rate for gradient descent (default: 0.01)
  • epochs (int): Number of training iterations (default: 600)

Methods

fit(X, y)

Train the model on labeled data.
model.fit(X_train, y_train)
Parameters:
  • X (pd.DataFrame): Feature matrix
  • y (pd.Series): Binary target labels (0 or 1)
Returns: The fitted model instance (allows method chaining) Implementation Details:
  • Automatically extracts and stores feature column names
  • Initializes weights to zero
  • Uses clipped sigmoid function to prevent numerical overflow
  • Updates weights using batch gradient descent

predict_proba(X)

Predict class probabilities.
proba = model.predict_proba(X_test)
# Returns: [[P(class=0), P(class=1)], ...]
Parameters:
  • X (pd.DataFrame): Feature matrix
Returns: numpy array of shape (n_samples, 2) with probabilities for each class

predict(X)

Predict binary class labels using 0.5 threshold.
predictions = model.predict(X_test)
# Returns: [0, 1, 1, 0, ...]
Parameters:
  • X (pd.DataFrame): Feature matrix
Returns: numpy array of predicted class labels (0 or 1)

Training Workflow

The platform provides train_predictive_models() to train both risk and outcome models:
from modeling.predictive import train_predictive_models

feature_cols = ["age", "pain_level", "bmi", "wait_time_min"]
artifacts = train_predictive_models(
    df=patient_data,
    feature_cols=feature_cols,
    risk_target="diagnosis",
    outcome_target="readmitted"
)
What it does:
  1. Feature Engineering: One-hot encodes categorical variables (hospital, gender)
  2. Normalization: Standardizes numeric features (zero mean, unit variance)
  3. Train/Test Split: 75/25 split with random seed 42 for reproducibility
  4. Risk Model: Predicts high-risk diagnoses (appendicitis, pregnancy)
  5. Outcome Model: Predicts readmission probability
Returns: ModelArtifacts containing:
  • risk_model: Trained SimpleLogisticModel for risk prediction
  • outcome_model: Trained SimpleLogisticModel for outcome prediction
  • X_test: Test feature set
  • y_risk_test: Test labels for risk model
  • y_outcome_test: Test labels for outcome model

Model Evaluation

Evaluate trained models with multiple metrics:
from modeling.predictive import evaluate_predictive_models

metrics = evaluate_predictive_models(artifacts)
Example Output:
{
    "risk_accuracy": 0.847,
    "risk_f1": 0.723,
    "risk_auc": 0.891,
    "outcome_accuracy": 0.782,
    "outcome_f1": 0.654,
    "outcome_auc": 0.812,
    "sample_count": 250.0
}

Metrics Explained

  • Accuracy: Proportion of correct predictions (TP + TN) / Total
  • F1 Score: Harmonic mean of precision and recall, balances false positives and false negatives
  • AUC (Area Under ROC Curve): Measures discrimination ability across all thresholds (0.5 = random, 1.0 = perfect)

Implementation Notes

Sigmoid Function

The model uses a numerically stable sigmoid:
@staticmethod
def _sigmoid(z: np.ndarray) -> np.ndarray:
    return 1 / (1 + np.exp(-np.clip(z, -20, 20)))
Clipping prevents overflow for extreme values.

Gradient Descent Update

Weights are updated each epoch:
logits = x @ self.weights[1:] + self.weights[0]
preds = self._sigmoid(logits)
err = preds - yv
self.weights[0] -= self.lr * err.mean()  # Bias update
self.weights[1:] -= self.lr * (x.T @ err) / len(x)  # Weight update

Source Reference

See modeling/predictive.py for the complete implementation.

Build docs developers (and LLMs) love