Configuration API

Overview

The config module defines the system-wide configuration using a dataclass that controls data paths, hardware constraints, experiment parameters, and reproducibility settings.

Data Classes

SystemConfig

Central configuration dataclass for the Hospital Data Analysis Platform.

@dataclass
class SystemConfig:
    random_seed: int = 42
    test_size: float = 0.25
    data_dir: Path = Path(__file__).resolve().parent.parent / "test"
    output_dir: Path = Path(__file__).resolve().parent / "artifacts"
    stream_chunk_size: int = 16
    stream_interval_ms: int = 10
    hardware_memory_limit_mb: int = 256
    hardware_compute_budget: int = 10_000
    benchmark_runs: int = 5
    confidence_level: float = 0.95
    feature_columns: list[str] = field(
        default_factory=lambda: ["age", "height", "weight", "bmi", "children", "months"]
    )
    target_risk: str = "diagnosis"
    target_outcome: str = "blood_test"
    experiment_memory_limits_mb: list[int] = field(default_factory=lambda: [64, 128, 256])
    experiment_compute_budgets: list[int] = field(default_factory=lambda: [2_000, 5_000, 10_000])
    experiment_stream_speeds_ms: list[int] = field(default_factory=lambda: [5, 10, 20])

Reproducibility Parameters

random_seed

int

default:"42"

Random seed for reproducibility across Python, NumPy, and hashing operations

test_size

float

default:"0.25"

Fraction of data to use for testing (0-1)

Path Configuration

data_dir

Path

default:"../test"

Directory containing input data files. Defaults to test directory relative to project root

output_dir

Path

default:"./artifacts"

Directory for output artifacts (CSV files, plots, models). Created automatically if it doesn’t exist

Streaming Parameters

stream_chunk_size

int

default:"16"

Number of records processed per stream chunk

stream_interval_ms

int

default:"10"

Time interval between stream chunks in milliseconds

Hardware Constraints

hardware_memory_limit_mb

int

default:"256"

Memory limit for hardware profiling in megabytes

hardware_compute_budget

int

default:"10000"

Compute budget representing maximum number of operations

Benchmarking Parameters

benchmark_runs

int

default:"5"

Number of repeated runs for benchmark experiments

confidence_level

float

default:"0.95"

Confidence level for statistical intervals (0-1)

Feature and Target Configuration

feature_columns

list[str]

List of feature column names to use in analysis

target_risk

str

default:"diagnosis"

Column name for risk classification target

target_outcome

str

default:"blood_test"

Column name for outcome prediction target

Experiment Parameter Sweeps

experiment_memory_limits_mb

list[int]

default:"[64, 128, 256]"

Memory limits to test in hardware constraint experiments (in MB)

experiment_compute_budgets

list[int]

default:"[2000, 5000, 10000]"

Compute budgets to test in hardware constraint experiments

experiment_stream_speeds_ms

list[int]

default:"[5, 10, 20]"

Stream interval speeds to test in experiments (in milliseconds)

Global Instance

CONFIG

The module provides a global CONFIG instance that is used throughout the platform:

CONFIG = SystemConfig()
CONFIG.output_dir.mkdir(parents=True, exist_ok=True)

The output directory is automatically created when the module is imported.

Usage Examples

Using the Default Configuration

from config import CONFIG

print(f"Random seed: {CONFIG.random_seed}")
print(f"Output directory: {CONFIG.output_dir}")
print(f"Feature columns: {CONFIG.feature_columns}")

Creating a Custom Configuration

from pathlib import Path
from config import SystemConfig

custom_config = SystemConfig(
    random_seed=123,
    hardware_memory_limit_mb=512,
    benchmark_runs=10,
    feature_columns=["age", "weight", "bmi"],
    output_dir=Path("./my_artifacts")
)

custom_config.output_dir.mkdir(parents=True, exist_ok=True)

Accessing Experiment Parameters

from config import CONFIG

# Generate all parameter combinations for experiments
for memory in CONFIG.experiment_memory_limits_mb:
    for compute in CONFIG.experiment_compute_budgets:
        for speed in CONFIG.experiment_stream_speeds_ms:
            print(f"Testing: {memory}MB, {compute} ops, {speed}ms interval")

CLI Commands

Data Modules

Models

Real-time

Deployment

Evaluation

Utilities

Overview

Data Classes

SystemConfig

Reproducibility Parameters

Path Configuration

Streaming Parameters

Hardware Constraints

Benchmarking Parameters

Feature and Target Configuration

Experiment Parameter Sweeps

Global Instance

CONFIG

Usage Examples

Using the Default Configuration

Creating a Custom Configuration

Accessing Experiment Parameters

Build docs developers (and LLMs) love

CLI Commands

Data Modules

Models

Real-time

Deployment

Evaluation

Utilities

​Overview

​Data Classes

​SystemConfig

​Reproducibility Parameters

​Path Configuration

​Streaming Parameters

​Hardware Constraints

​Benchmarking Parameters

​Feature and Target Configuration

​Experiment Parameter Sweeps

​Global Instance

​CONFIG

​Usage Examples

​Using the Default Configuration

​Creating a Custom Configuration

​Accessing Experiment Parameters

Build docs developers (and LLMs) love

Overview

Data Classes

SystemConfig

Reproducibility Parameters

Path Configuration

Streaming Parameters

Hardware Constraints

Benchmarking Parameters

Feature and Target Configuration

Experiment Parameter Sweeps

Global Instance

CONFIG

Usage Examples

Using the Default Configuration

Creating a Custom Configuration

Accessing Experiment Parameters