System Requirements
Python Version
Python 3.12 or higher is required. This project uses modern Python features and type hints that require at least Python 3.12.
Python 3.12.x or higher
Operating System
The project works on:- Linux (Ubuntu, Debian, Fedora, etc.)
- macOS (10.15 Catalina or later)
- Windows (Windows 10/11 with WSL recommended)
Disk Space
- Minimum: 500 MB (dependencies + dataset)
- Recommended: 1 GB (includes cache and results)
Installation Methods
Method 1: UV (Recommended)
UV is a fast, modern Python package installer that’s significantly faster than pip.Method 2: pip
Traditional installation using pip and virtual environments.Core Dependencies
The project requires these key packages:Data Processing & Analysis
| Package | Version | Purpose |
|---|---|---|
| pandas | ≥3.0.1 | Data manipulation and analysis |
| numpy | ≥2.4.2 | Numerical computing and arrays |
| pathlib | ≥1.0.1 | File path operations |
Machine Learning
| Package | Version | Purpose |
|---|---|---|
| scikit-learn | ≥1.8.0 | ML algorithms (regression, preprocessing) |
| joblib | 1.5.3 | Model serialization and saving |
Visualization
| Package | Version | Purpose |
|---|---|---|
| matplotlib | ≥3.10.8 | Static plots and visualizations |
| seaborn | ≥0.13.2 | Statistical data visualization |
| plotly | ≥6.5.2 | Interactive plots |
Jupyter Environment
| Package | Version | Purpose |
|---|---|---|
| jupyter | ≥1.1.1 | Jupyter notebook interface |
| jupyterlab | 4.5.5 | Modern Jupyter environment |
| ipykernel | ≥7.2.0 | Python kernel for notebooks |
Data Acquisition
| Package | Version | Purpose |
|---|---|---|
| kagglehub | ≥1.0.0 | Automatic Kaggle dataset download |
The dataset (Boston Housing) is automatically downloaded from Kaggle using
kagglehub on first run. No manual download or API keys required!Jupyter Setup
Starting Jupyter Notebook
http://localhost:8888
Configuring the Kernel
If Jupyter doesn’t recognize your virtual environment:Jupyter Extensions (Optional)
Enhance your Jupyter experience:Dataset Download
Automatic Download
The project useskagglehub to automatically download the Boston Housing dataset:
Dataset Details
- Source: Kaggle - Boston Housing Dataset
- Size: ~25 KB (BostonHousing.csv)
- Samples: 506 rows
- Features: 13 input features + 1 target variable
- Cache Location:
~/.cache/kagglehub/
Manual Download (Optional)
If automatic download fails, you can manually download:- Visit: https://www.kaggle.com/datasets/arunjangir245/boston-housing-dataset
- Download
BostonHousing.csv - Update notebook to point to your local file:
Verifying Installation
Run this verification script to ensure everything is installed correctly:Troubleshooting
Python Version Issues
Python 3.12 not available
Python 3.12 not available
Problem: Your system has an older Python version.Solutions:
-
Using pyenv (Linux/macOS):
-
Using apt (Ubuntu/Debian):
- Download from python.org: Visit https://www.python.org/downloads/ and install Python 3.12+
Dependency Installation Issues
pip install fails with build errors
pip install fails with build errors
Problem: C++ compilation errors during installation (especially for numpy/scipy)Solution 1 - Install build tools:Solution 2 - Use pre-built wheels:
SSL certificate verification failed
SSL certificate verification failed
Problem:
kagglehub fails to download dataset due to SSL errorsSolutions:Out of memory during installation
Out of memory during installation
Problem: Installation crashes with memory errorsSolution:
Jupyter Issues
Jupyter command not found
Jupyter command not found
Problem:
jupyter command not available after installationSolution:Kernel keeps dying
Kernel keeps dying
Problem: Jupyter kernel crashes during executionCauses & Solutions:
- Out of memory: Close other applications, or reduce dataset size for testing
- Corrupted environment: Recreate virtual environment
- Conflicting packages: Clear pip cache
Can't import modules in notebook
Can't import modules in notebook
Problem:
ModuleNotFoundError even after installationSolution:- Check that you’re using the correct kernel (see Configuring the Kernel)
- Restart the Jupyter kernel: Kernel → Restart
- Verify installation in notebook:
Dataset Issues
Dataset download is very slow
Dataset download is very slow
Problem:
kagglehub download takes a long timeSolutions:- Be patient - first download can take 1-2 minutes depending on connection
- Check your internet speed:
speedtest-cli - Use manual download method (see Manual Download)
Permission denied when caching dataset
Permission denied when caching dataset
Problem: Can’t write to
~/.cache/kagglehub/Solution:Platform-Specific Notes
Windows
Using WSL:- Use Anaconda for easier dependency management
- Some visualization libraries may render differently
- File paths use backslashes:
results\model.joblib
macOS
M1/M2 Apple Silicon: Some packages need Rosetta or native ARM builds:Linux
Ubuntu 20.04 / Debian 10 users: Python 3.12 not in default repos - use deadsnakes PPA:Next Steps
Quick Start
Run your first analysis and train models
Project Structure
Understand the codebase organization
Data Exploration
Learn about the Boston Housing dataset
Model Training
Deep dive into model training workflow
Getting Help
If you encounter issues not covered here:- Check the GitHub Issues
- Search existing discussions
- Create a new issue with:
- Your Python version (
python --version) - Operating system
- Full error message
- Steps to reproduce
- Your Python version (
installation and we’ll help you get up and running!