Overview
The Hospital Data Analysis Platform includes a custom logistic regression implementation for predicting patient risk factors and clinical outcomes. TheSimpleLogisticModel class provides a lightweight, interpretable approach to binary classification.
SimpleLogisticModel Class
TheSimpleLogisticModel implements logistic regression using gradient descent optimization.
Initialization
lr(float): Learning rate for gradient descent (default: 0.01)epochs(int): Number of training iterations (default: 600)
Methods
fit(X, y)
Train the model on labeled data.X(pd.DataFrame): Feature matrixy(pd.Series): Binary target labels (0 or 1)
- Automatically extracts and stores feature column names
- Initializes weights to zero
- Uses clipped sigmoid function to prevent numerical overflow
- Updates weights using batch gradient descent
predict_proba(X)
Predict class probabilities.X(pd.DataFrame): Feature matrix
predict(X)
Predict binary class labels using 0.5 threshold.X(pd.DataFrame): Feature matrix
Training Workflow
The platform providestrain_predictive_models() to train both risk and outcome models:
- Feature Engineering: One-hot encodes categorical variables (hospital, gender)
- Normalization: Standardizes numeric features (zero mean, unit variance)
- Train/Test Split: 75/25 split with random seed 42 for reproducibility
- Risk Model: Predicts high-risk diagnoses (appendicitis, pregnancy)
- Outcome Model: Predicts readmission probability
ModelArtifacts containing:
risk_model: Trained SimpleLogisticModel for risk predictionoutcome_model: Trained SimpleLogisticModel for outcome predictionX_test: Test feature sety_risk_test: Test labels for risk modely_outcome_test: Test labels for outcome model
Model Evaluation
Evaluate trained models with multiple metrics:Metrics Explained
- Accuracy: Proportion of correct predictions (TP + TN) / Total
- F1 Score: Harmonic mean of precision and recall, balances false positives and false negatives
- AUC (Area Under ROC Curve): Measures discrimination ability across all thresholds (0.5 = random, 1.0 = perfect)
Implementation Notes
Sigmoid Function
The model uses a numerically stable sigmoid:Gradient Descent Update
Weights are updated each epoch:Source Reference
Seemodeling/predictive.py for the complete implementation.