Linear Regression

Overview

Linear regression is the foundation of this project, implemented in three variants to demonstrate different modeling strategies. The multivariate model achieves strong baseline performance with a Test R² of 0.710 and RMSE of 4.650, ranking 4th among all 9 models tested.

Performance: Linear Regression (Multivariate) provides a solid interpretable baseline, though Decision Tree (R² = 0.850) and Neural Network (R² = 0.806) models achieve superior accuracy.

Three Variants

Univariate
Multivariate
Feature Selection

Univariate Linear Regression

Uses only the rm (average number of rooms) feature, which has the strongest correlation with house prices.Performance:

Train R²: 0.489
Test R²: 0.458
Test RMSE: 6.355
CV R² (mean±std): 0.452 ± 0.177

This model suffers from underfitting due to using only one feature. It’s too simple to capture the complexity of house pricing.

Code Implementation:

from sklearn.linear_model import LinearRegression

# Select only 'rm' feature
X_train_uni = X_train[['rm']]
X_test_uni = X_test[['rm']]

# Train univariate model
lr_uni = LinearRegression()
lr_uni.fit(X_train_uni, y_train)

When to use: Educational purposes to understand single-feature relationships. Not recommended for production.

Multivariate Linear Regression

Uses all 13 features from the Boston Housing dataset for comprehensive prediction.Performance:

Train R²: 0.743
Test R²: 0.710 ⭐ BEST MODEL
Test RMSE: 4.650
CV R² (mean±std): 0.688 ± 0.092

Excellent balance between bias and variance with minimal overfitting (train-test R² gap of only 0.033).

Code Implementation:

from sklearn.linear_model import LinearRegression

# Use all 13 features
lr_multi = LinearRegression()
lr_multi.fit(X_train, y_train)

# Make predictions
predictions = lr_multi.predict(X_test)

Feature Importance (by coefficient magnitude):

Feature	Coefficient	Impact
nox	-15.42	Nitric oxide concentration (negative)
rm	+4.06	Number of rooms (positive)
chas	+3.12	Charles River proximity (positive)
dis	-1.38	Distance to employment centers (negative)
ptratio	-0.91	Pupil-teacher ratio (negative)

When to use: This is the recommended baseline model for house price prediction.