Linear regression is the foundation of this project, implemented in three variants to demonstrate different modeling strategies. The multivariate model achieves strong baseline performance with a Test R² of 0.710 and RMSE of 4.650, ranking 4th among all 9 models tested.
Performance: Linear Regression (Multivariate) provides a solid interpretable baseline, though Decision Tree (R² = 0.850) and Neural Network (R² = 0.806) models achieve superior accuracy.
Uses all 13 features from the Boston Housing dataset for comprehensive prediction.Performance:
Train R²: 0.743
Test R²: 0.710 ⭐ BEST MODEL
Test RMSE: 4.650
CV R² (mean±std): 0.688 ± 0.092
Excellent balance between bias and variance with minimal overfitting (train-test R² gap of only 0.033).
Code Implementation:
from sklearn.linear_model import LinearRegression# Use all 13 featureslr_multi = LinearRegression()lr_multi.fit(X_train, y_train)# Make predictionspredictions = lr_multi.predict(X_test)
Feature Importance (by coefficient magnitude):
Feature
Coefficient
Impact
nox
-15.42
Nitric oxide concentration (negative)
rm
+4.06
Number of rooms (positive)
chas
+3.12
Charles River proximity (positive)
dis
-1.38
Distance to employment centers (negative)
ptratio
-0.91
Pupil-teacher ratio (negative)
When to use: This is the recommended baseline model for house price prediction.