Skip to main content
Get started with the Hospital Data Analysis Platform by running your first analytics pipeline. This guide will walk you through executing the three main CLI commands and understanding the output.

Prerequisites

Before you begin, ensure you have:
  • Python 3.10 or higher installed
  • Completed the installation steps
  • Hospital data CSV files in the test directory

Running Your First Pipeline

1

Generate Dataset Manifest

Create a manifest of your hospital data files to validate schema and track data versions.
cd "Data Analysis for Hospitals/task"
python cli.py manifest
{
  "files": ["general.csv", "prenatal.csv", "sports.csv"],
  "total_records": 1500,
  "schema_version": "1.0",
  "checksums": {
    "general.csv": "a3b2c1d4...",
    "prenatal.csv": "e5f6g7h8...",
    "sports.csv": "i9j0k1l2..."
  }
}
This command validates your data files and generates version tracking information.
2

Execute Full Analytics Pipeline

Run the complete pipeline including ingestion, preprocessing, feature engineering, modeling, and deployment monitoring.
python cli.py run
The pipeline executes these stages:
  1. Data Ingestion - Loads and merges hospital CSV files
  2. Preprocessing - Cleans and normalizes data
  3. Feature Engineering - Creates derived features (age_range, is_adult, bmi_risk)
  4. Model Training - Trains risk and outcome prediction models
  5. Anomaly Detection - Identifies outliers and generates early warnings
  6. Streaming Inference - Compares batch vs streaming performance
  7. Hardware Profiling - Adjusts batch sizes and tracks resource utilization
  8. CPU Inference - Measures inference latency and throughput
  9. ONNX Export - Serializes models for cross-platform deployment
  10. Monitoring - Generates deployment metrics and alert summaries
  11. Benchmarking - Runs repeated experiments with confidence intervals
  12. Hardware Experiments - Tests performance under different constraints
{
  "reproducibility": {
    "random_seed": 42,
    "python_version": "3.10.12",
    "numpy_version": "1.26.4"
  },
  "predictive_metrics": {
    "risk_accuracy": 0.847,
    "risk_f1": 0.723,
    "risk_auc": 0.891,
    "outcome_accuracy": 0.812,
    "outcome_f1": 0.689,
    "outcome_auc": 0.856
  },
  "anomaly_alerts": {
    "total_alerts": 45,
    "alert_rate": 0.03
  },
  "detection_latency_s": 2.4,
  "streaming": {
    "batch_time_s": 0.124,
    "stream_time_s": 0.156,
    "stream_latency_ms_per_row": 0.104,
    "stream_throughput_rows_per_s": 9615.38
  },
  "hardware": {
    "adjusted_batch_size": 64,
    "compute_utilization": 0.73
  },
  "cpu_inference": {
    "inference_latency_ms": 12.5,
    "mean_probability": 0.342,
    "std_probability": 0.187
  },
  "onnx_exported": true,
  "deployment_monitoring": {
    "alert_count": 12,
    "alert_rate": 0.032,
    "high_risk_ratio": 0.15
  }
}
3

Run Hardware-Constrained Experiments

Evaluate early warning system performance under different memory, compute, and latency constraints.
python cli.py early-warning-experiment
This command runs 27 experiment scenarios (3 memory limits × 3 compute budgets × 3 stream speeds) to evaluate:
  • Detection latency under resource constraints
  • Prediction accuracy vs hardware limits
  • False positive rates
  • Detection quality scores
{
  "summary": {
    "total_scenarios": 27,
    "mean_detection_latency_s": 3.2,
    "mean_accuracy": 0.834,
    "mean_false_positive_rate": 0.042,
    "best_scenario": {
      "memory_mb": 256,
      "compute_budget": 10000,
      "stream_interval_ms": 10
    }
  },
  "benchmark": {
    "detection_latency_s": {
      "mean": 3.2,
      "std": 0.8,
      "ci_lower": 2.8,
      "ci_upper": 3.6
    },
    "prediction_accuracy": {
      "mean": 0.834,
      "std": 0.023,
      "ci_lower": 0.820,
      "ci_upper": 0.848
    }
  },
  "artifacts": {
    "results_csv": "artifacts/early_warning_experiment_results.csv",
    "plots": "artifacts/early_warning_plots.png"
  }
}

Understanding Artifacts

After running the pipeline, artifacts are written to Data Analysis for Hospitals/task/artifacts/:
FileDescription
experiment_log.jsonComplete pipeline execution log with all metrics
dataset_manifest.jsonDataset version manifest with checksums
risk_model.onnxExported ONNX model for risk prediction
hardware_profile.csvHardware profiling results with operator-level metrics
early_warning_experiment_results.csvDetailed results from constraint experiments
early_warning_plots.pngVisualization of performance across scenarios

Common Workflows

Iterative Development

# 1. Validate data schema
python cli.py manifest

# 2. Run full pipeline
python cli.py run

# 3. Review artifacts and adjust config.py as needed

# 4. Re-run experiments
python cli.py early-warning-experiment

Production Deployment

# 1. Run pipeline with production config
python cli.py run

# 2. Export model to ONNX (automatically done in pipeline)

# 3. Validate ONNX model
python -c "import onnx; model = onnx.load('artifacts/risk_model.onnx'); onnx.checker.check_model(model)"

# 4. Deploy ONNX model to target runtime
# (Use ONNX Runtime, TensorRT, or other compatible inference engines)

Troubleshooting

Error: KeyError: 'column_name'Solution: Verify your CSV files match the expected schema. Run python cli.py manifest to see which files are being loaded.Required columns: age, height, weight, bmi, children, months, hospital, gender, diagnosis, blood_test
Error: MemoryError or slow executionSolution: Reduce memory limits in config.py:
hardware_memory_limit_mb: int = 128  # Lower from 256
The pipeline will automatically adjust batch sizes using auto_adjust_batch_size().
Issue: Streaming latency exceeds requirementsSolution: Adjust chunk size in config.py:
stream_chunk_size: int = 8  # Reduce from 16 for lower latency
stream_interval_ms: int = 5  # Reduce from 10 for faster updates
Note: Smaller chunks reduce latency but may decrease throughput.

Next Steps

Core Concepts

Understand the pipeline architecture and design philosophy

Configuration

Learn about all configuration options and tuning parameters

Modeling

Deep dive into predictive models and risk stratification

CLI Reference

Complete CLI command reference with all options

Build docs developers (and LLMs) love