Skip to main content

System Requirements

Python Version

Python 3.10 or higher is required. The platform uses modern Python features and type hints that depend on 3.10+.
Verify your Python version:
python --version
# or
python3 --version
Expected output: Python 3.10.x or higher

Hardware Requirements

Minimum

  • 2 CPU cores
  • 2GB RAM
  • 1GB disk space

Recommended

  • 4+ CPU cores
  • 4GB+ RAM
  • 5GB disk space
The platform is designed for CPU-first execution. GPU acceleration is intentionally out of scope to ensure broad deployment compatibility.

Operating System

The platform is compatible with:
  • Linux (Ubuntu 20.04+, CentOS 8+, etc.)
  • macOS 11+
  • Windows 10/11 with WSL2

Installation Steps

1

Access the project directory

Navigate to the task directory:
cd "Data Analysis for Hospitals/task"
2

Create a virtual environment (recommended)

Isolate dependencies using a virtual environment:
python3 -m venv venv
source venv/bin/activate
3

Install dependencies

Install all required packages from requirements.txt:
pip install -r requirements.txt
This may take 2-5 minutes depending on your internet connection and system performance.
4

Verify installation

Confirm all packages are installed correctly:
python -c "import numpy, pandas, sklearn, matplotlib; print('All core dependencies installed')"

Dependencies

The platform requires the following Python packages with pinned versions for reproducibility:

Core Dependencies

PackageVersionPurpose
numpy1.26.4Numerical computing and array operations
pandas2.2.2Data manipulation and CSV processing
scikit-learn1.5.1Machine learning models and preprocessing

Visualization & Monitoring

PackageVersionPurpose
matplotlib3.9.2Plotting and visualization
seaborn0.13.2Statistical graphics
psutil6.0.0Hardware profiling and resource monitoring

Deployment & Testing

PackageVersionPurpose
skl2onnx1.17.0ONNX model export for production inference
pytest8.3.2Unit testing and validation
Version compatibility is critical. The CI pipeline uses pinned dependencies for deterministic execution. Version mismatches may cause serialization errors or numerical inconsistencies.

Configuration

Directory Structure

After installation, ensure the following structure exists:
Data Analysis for Hospitals/task/
├── cli.py                    # Main command-line interface
├── config.py                 # Configuration parameters
├── requirements.txt          # Dependency specifications
├── test/                     # Input data directory
│   ├── general.csv          # Hospital general data
│   ├── prenatal.csv         # Prenatal care data
│   └── sports.csv           # Sports medicine data
├── artifacts/               # Output directory (auto-created)
├── ingestion/
├── preprocessing/
├── feature_engineering/
├── modeling/
├── anomaly_detection/
├── real_time/
├── deployment/
├── evaluation/
└── utils/
The artifacts/ directory is automatically created when you run the pipeline. You don’t need to create it manually.

Data Directory Setup

Place your hospital CSV files in the test/ directory:
ls test/
# Expected output:
# general.csv  prenatal.csv  sports.csv
CSV Schema Requirements:
  • Files must follow the expected column schema used in feature generation
  • Column names should match the feature engineering expectations
  • Missing columns will trigger schema drift errors

Permissions

File System Access

Ensure your runtime has sufficient permissions:
# Check write permissions for output directory
touch artifacts/test_write.txt && rm artifacts/test_write.txt
If this fails, adjust permissions:
chmod -R u+w "Data Analysis for Hospitals/task/"

Python Package Installation

If you encounter permission errors during pip install, use one of these approaches:
pip install --user -r requirements.txt

Validation

Run the Test Suite

Verify your installation with the test suite:
pytest
All tests should pass with deterministic behavior via explicit seed control.

Generate a Dataset Manifest

Test data loading and validation:
python cli.py manifest
Expected output: JSON manifest with file metadata, row counts, and checksums.

Run a Quick Pipeline Test

Execute the full pipeline to confirm everything works:
python cli.py run
1

Data loading

Watch for messages about loading hospital data from CSV files.
2

Model training

Monitor predictive model training and evaluation.
3

Artifact generation

Verify outputs are written to artifacts/ directory.

Troubleshooting

ImportError: No module named ‘xxx’

Cause: Dependency not installed or wrong Python environment active Solution:
# Ensure virtual environment is activated
source venv/bin/activate  # Linux/macOS
# or
.\venv\Scripts\Activate.ps1  # Windows

# Reinstall dependencies
pip install -r requirements.txt

Version Conflicts

Cause: Existing packages conflict with pinned versions Solution:
# Create fresh virtual environment
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Permission Denied on artifacts/

Cause: Insufficient write permissions Solution:
mkdir -p "Data Analysis for Hospitals/task/artifacts"
chmod -R u+w "Data Analysis for Hospitals/task/artifacts"

Python 3.10+ Not Available

Solution:
sudo apt update
sudo apt install python3.10 python3.10-venv

Environment Variables (Optional)

Customize behavior with environment variables:
# Set custom random seed for reproducibility
export RANDOM_SEED=42

# Adjust hardware constraints
export MEMORY_LIMIT_MB=512
export COMPUTE_BUDGET=0.8

# Configure output directory
export OUTPUT_DIR=./custom_artifacts
Most users don’t need to set environment variables. The platform uses sensible defaults from config.py.

Continuous Integration

The repository uses standard Python tooling:
  • Testing Framework: pytest with unittest
  • CI Target: Python 3.10 with pinned dependencies
  • Deterministic Behavior: Explicit seed and threading environment controls
For CI/CD integration, ensure your pipeline uses the same pinned dependency versions to maintain reproducibility.

Next Steps

Quick Start

Run your first pipeline in under 5 minutes

Configuration Guide

Customize pipeline parameters and constraints

CLI Reference

Detailed command documentation

Operations Guide

Production deployment best practices

Getting Help

Before deploying to production:
  • Review docs/OPERATIONS.md for deployment considerations
  • Expand default benchmarks for production sign-off
  • Validate hardware estimates with device-calibrated measurements
For issues or questions:
  • Check troubleshooting sections in this guide
  • Review error messages for schema drift or missing dependencies
  • Verify your CSV files match expected schemas

Build docs developers (and LLMs) love