Prerequisites
Before you begin, ensure you have:- Python 3.10 or higher installed
- Completed the installation steps
- Hospital data CSV files in the test directory
Running Your First Pipeline
Generate Dataset Manifest
Create a manifest of your hospital data files to validate schema and track data versions.
This command validates your data files and generates version tracking information.
Expected output
Expected output
Execute Full Analytics Pipeline
Run the complete pipeline including ingestion, preprocessing, feature engineering, modeling, and deployment monitoring.The pipeline executes these stages:
- Data Ingestion - Loads and merges hospital CSV files
- Preprocessing - Cleans and normalizes data
- Feature Engineering - Creates derived features (age_range, is_adult, bmi_risk)
- Model Training - Trains risk and outcome prediction models
- Anomaly Detection - Identifies outliers and generates early warnings
- Streaming Inference - Compares batch vs streaming performance
- Hardware Profiling - Adjusts batch sizes and tracks resource utilization
- CPU Inference - Measures inference latency and throughput
- ONNX Export - Serializes models for cross-platform deployment
- Monitoring - Generates deployment metrics and alert summaries
- Benchmarking - Runs repeated experiments with confidence intervals
- Hardware Experiments - Tests performance under different constraints
Sample output
Sample output
Run Hardware-Constrained Experiments
Evaluate early warning system performance under different memory, compute, and latency constraints.This command runs 27 experiment scenarios (3 memory limits × 3 compute budgets × 3 stream speeds) to evaluate:
- Detection latency under resource constraints
- Prediction accuracy vs hardware limits
- False positive rates
- Detection quality scores
Experiment output
Experiment output
Understanding Artifacts
After running the pipeline, artifacts are written toData Analysis for Hospitals/task/artifacts/:
| File | Description |
|---|---|
experiment_log.json | Complete pipeline execution log with all metrics |
dataset_manifest.json | Dataset version manifest with checksums |
risk_model.onnx | Exported ONNX model for risk prediction |
hardware_profile.csv | Hardware profiling results with operator-level metrics |
early_warning_experiment_results.csv | Detailed results from constraint experiments |
early_warning_plots.png | Visualization of performance across scenarios |
Common Workflows
Iterative Development
Production Deployment
Troubleshooting
Schema drift or missing columns
Schema drift or missing columns
Error:
KeyError: 'column_name'Solution: Verify your CSV files match the expected schema. Run python cli.py manifest to see which files are being loaded.Required columns: age, height, weight, bmi, children, months, hospital, gender, diagnosis, blood_testMemory pressure under constrained hardware
Memory pressure under constrained hardware
Error: The pipeline will automatically adjust batch sizes using
MemoryError or slow executionSolution: Reduce memory limits in config.py:auto_adjust_batch_size().High latency in streaming inference
High latency in streaming inference
Issue: Streaming latency exceeds requirementsSolution: Adjust chunk size in Note: Smaller chunks reduce latency but may decrease throughput.
config.py:Next Steps
Core Concepts
Understand the pipeline architecture and design philosophy
Configuration
Learn about all configuration options and tuning parameters
Modeling
Deep dive into predictive models and risk stratification
CLI Reference
Complete CLI command reference with all options