System Requirements
Python Version
Python 3.10 or higher is required. The platform uses modern Python features and type hints that depend on 3.10+.
Python 3.10.x or higher
Hardware Requirements
Minimum
- 2 CPU cores
- 2GB RAM
- 1GB disk space
Recommended
- 4+ CPU cores
- 4GB+ RAM
- 5GB disk space
The platform is designed for CPU-first execution. GPU acceleration is intentionally out of scope to ensure broad deployment compatibility.
Operating System
The platform is compatible with:- Linux (Ubuntu 20.04+, CentOS 8+, etc.)
- macOS 11+
- Windows 10/11 with WSL2
Installation Steps
Install dependencies
Install all required packages from
requirements.txt:This may take 2-5 minutes depending on your internet connection and system performance.
Dependencies
The platform requires the following Python packages with pinned versions for reproducibility:Core Dependencies
| Package | Version | Purpose |
|---|---|---|
| numpy | 1.26.4 | Numerical computing and array operations |
| pandas | 2.2.2 | Data manipulation and CSV processing |
| scikit-learn | 1.5.1 | Machine learning models and preprocessing |
Visualization & Monitoring
| Package | Version | Purpose |
|---|---|---|
| matplotlib | 3.9.2 | Plotting and visualization |
| seaborn | 0.13.2 | Statistical graphics |
| psutil | 6.0.0 | Hardware profiling and resource monitoring |
Deployment & Testing
| Package | Version | Purpose |
|---|---|---|
| skl2onnx | 1.17.0 | ONNX model export for production inference |
| pytest | 8.3.2 | Unit testing and validation |
Configuration
Directory Structure
After installation, ensure the following structure exists:The
artifacts/ directory is automatically created when you run the pipeline. You don’t need to create it manually.Data Directory Setup
Place your hospital CSV files in thetest/ directory:
CSV Schema Requirements:
- Files must follow the expected column schema used in feature generation
- Column names should match the feature engineering expectations
- Missing columns will trigger schema drift errors
Permissions
File System Access
Ensure your runtime has sufficient permissions:Python Package Installation
If you encounter permission errors duringpip install, use one of these approaches:
Validation
Run the Test Suite
Verify your installation with the test suite:All tests should pass with deterministic behavior via explicit seed control.
Generate a Dataset Manifest
Test data loading and validation:Run a Quick Pipeline Test
Execute the full pipeline to confirm everything works:Troubleshooting
ImportError: No module named ‘xxx’
Cause: Dependency not installed or wrong Python environment active Solution:Version Conflicts
Cause: Existing packages conflict with pinned versions Solution:Permission Denied on artifacts/
Cause: Insufficient write permissions Solution:Python 3.10+ Not Available
Solution:Environment Variables (Optional)
Customize behavior with environment variables:Most users don’t need to set environment variables. The platform uses sensible defaults from
config.py.Continuous Integration
The repository uses standard Python tooling:- Testing Framework: pytest with unittest
- CI Target: Python 3.10 with pinned dependencies
- Deterministic Behavior: Explicit seed and threading environment controls
For CI/CD integration, ensure your pipeline uses the same pinned dependency versions to maintain reproducibility.
Next Steps
Quick Start
Run your first pipeline in under 5 minutes
Configuration Guide
Customize pipeline parameters and constraints
CLI Reference
Detailed command documentation
Operations Guide
Production deployment best practices
Getting Help
For issues or questions:- Check troubleshooting sections in this guide
- Review error messages for schema drift or missing dependencies
- Verify your CSV files match expected schemas