System Requirements
Before installing, verify your system meets these requirements:Python Version
Python 3.8 or higher
Memory
Minimum 512MB RAM (1GB+ recommended)
Storage
500MB+ available disk space
Operating System
Linux, macOS, or Windows with WSL
Installation Steps
Create Virtual Environment (Recommended)
Isolate dependencies using a virtual environment:Activate the environment:
Your terminal prompt should change to indicate the virtual environment is active (e.g.,
(venv)).Install Dependencies
Install all required packages from This installs:
requirements.txt:numpy==1.26.4- Numerical computingpandas==2.2.2- Data manipulationscikit-learn==1.5.1- Machine learningmatplotlib==3.9.2- Visualizationpsutil==6.0.0- System monitoringjoblib==1.4.2- Parallel processingrequests==2.32.3- HTTP utilities
Dependency Details
Core Dependencies
numpy 1.26.4
numpy 1.26.4
Provides high-performance array operations and numerical computing primitives. Used throughout the pipeline for efficient data manipulation.
pandas 2.2.2
pandas 2.2.2
DataFrame library for structured data processing. Handles CSV ingestion, chunk processing, and data transformations.
scikit-learn 1.5.1
scikit-learn 1.5.1
Machine learning framework providing:
- Linear regression models (baseline)
- Preprocessing utilities (scaling, encoding)
- Model evaluation metrics
- Incremental learning with
partial_fit
matplotlib 3.9.2
matplotlib 3.9.2
Visualization library for generating benchmark plots:
- Latency vs accuracy charts
- Memory vs accuracy analysis
- 3D resource-accuracy visualizations
System Monitoring Dependencies
psutil 6.0.0
psutil 6.0.0
Cross-platform system monitoring for:
- Real-time memory tracking
- CPU utilization measurement
- Process resource profiling
joblib 1.4.2
joblib 1.4.2
Provides parallel execution capabilities:
- Multi-process benchmark sweeps
- Efficient serialization
- Progress tracking
Optional: Energy Monitoring
Energy telemetry requires Intel RAPL (Running Average Power Limit) support. This is optional and the pipeline works without it.
Check RAPL Availability
On Linux systems with Intel CPUs:Enable RAPL Access
For accurate energy measurements, you may need to enable access:Configuration
Directory Structure
After installation, your project should have this structure:Verify Data File
Ensure the NBA2K dataset is present:version, salary, b_day, draft_year, height, weight
Platform-Specific Notes
Linux
Most straightforward installation. All features supported:macOS
- Using Homebrew
- Using pyenv
macOS does not support RAPL energy monitoring. The pipeline uses CPU-based estimation instead.
Windows (WSL Recommended)
Install WSL2 and Ubuntu:Troubleshooting
pip install fails with SSL errors
pip install fails with SSL errors
Upgrade pip and try again:Or use a different index:
ImportError: No module named 'sklearn'
ImportError: No module named 'sklearn'
Verify the virtual environment is activated:If not activated:
Permission denied when installing packages
Permission denied when installing packages
Don’t use
sudo with pip. Instead:- Use a virtual environment (recommended)
- Or install with
--userflag:pip install --user -r requirements.txt
numpy/pandas build failures
numpy/pandas build failures
These packages require C compilers. Install build tools:Ubuntu/Debian:macOS:Windows:
Download Microsoft C++ Build Tools from visualstudio.microsoft.com
Tests fail with FileNotFoundError
Tests fail with FileNotFoundError
Ensure you’re running tests from the correct directory:
Upgrading Dependencies
To upgrade to newer package versions:Verifying Installation
Quick Verification Script
Create a fileverify_install.py:
Full Pipeline Test
Run a minimal pipeline test:test_artifacts/.
Next Steps
Quickstart Guide
Run your first pipeline in minutes
Configuration
Learn about all configuration options
Architecture
Understand the pipeline design
API Reference
Explore the Python API