Skip to main content

Overview

The config module defines the system-wide configuration using a dataclass that controls data paths, hardware constraints, experiment parameters, and reproducibility settings.

Data Classes

SystemConfig

Central configuration dataclass for the Hospital Data Analysis Platform.
@dataclass
class SystemConfig:
    random_seed: int = 42
    test_size: float = 0.25
    data_dir: Path = Path(__file__).resolve().parent.parent / "test"
    output_dir: Path = Path(__file__).resolve().parent / "artifacts"
    stream_chunk_size: int = 16
    stream_interval_ms: int = 10
    hardware_memory_limit_mb: int = 256
    hardware_compute_budget: int = 10_000
    benchmark_runs: int = 5
    confidence_level: float = 0.95
    feature_columns: list[str] = field(
        default_factory=lambda: ["age", "height", "weight", "bmi", "children", "months"]
    )
    target_risk: str = "diagnosis"
    target_outcome: str = "blood_test"
    experiment_memory_limits_mb: list[int] = field(default_factory=lambda: [64, 128, 256])
    experiment_compute_budgets: list[int] = field(default_factory=lambda: [2_000, 5_000, 10_000])
    experiment_stream_speeds_ms: list[int] = field(default_factory=lambda: [5, 10, 20])

Reproducibility Parameters

random_seed
int
default:"42"
Random seed for reproducibility across Python, NumPy, and hashing operations
test_size
float
default:"0.25"
Fraction of data to use for testing (0-1)

Path Configuration

data_dir
Path
default:"../test"
Directory containing input data files. Defaults to test directory relative to project root
output_dir
Path
default:"./artifacts"
Directory for output artifacts (CSV files, plots, models). Created automatically if it doesn’t exist

Streaming Parameters

stream_chunk_size
int
default:"16"
Number of records processed per stream chunk
stream_interval_ms
int
default:"10"
Time interval between stream chunks in milliseconds

Hardware Constraints

hardware_memory_limit_mb
int
default:"256"
Memory limit for hardware profiling in megabytes
hardware_compute_budget
int
default:"10000"
Compute budget representing maximum number of operations

Benchmarking Parameters

benchmark_runs
int
default:"5"
Number of repeated runs for benchmark experiments
confidence_level
float
default:"0.95"
Confidence level for statistical intervals (0-1)

Feature and Target Configuration

feature_columns
list[str]
List of feature column names to use in analysis
target_risk
str
default:"diagnosis"
Column name for risk classification target
target_outcome
str
default:"blood_test"
Column name for outcome prediction target

Experiment Parameter Sweeps

experiment_memory_limits_mb
list[int]
default:"[64, 128, 256]"
Memory limits to test in hardware constraint experiments (in MB)
experiment_compute_budgets
list[int]
default:"[2000, 5000, 10000]"
Compute budgets to test in hardware constraint experiments
experiment_stream_speeds_ms
list[int]
default:"[5, 10, 20]"
Stream interval speeds to test in experiments (in milliseconds)

Global Instance

CONFIG

The module provides a global CONFIG instance that is used throughout the platform:
CONFIG = SystemConfig()
CONFIG.output_dir.mkdir(parents=True, exist_ok=True)
The output directory is automatically created when the module is imported.

Usage Examples

Using the Default Configuration

from config import CONFIG

print(f"Random seed: {CONFIG.random_seed}")
print(f"Output directory: {CONFIG.output_dir}")
print(f"Feature columns: {CONFIG.feature_columns}")

Creating a Custom Configuration

from pathlib import Path
from config import SystemConfig

custom_config = SystemConfig(
    random_seed=123,
    hardware_memory_limit_mb=512,
    benchmark_runs=10,
    feature_columns=["age", "weight", "bmi"],
    output_dir=Path("./my_artifacts")
)

custom_config.output_dir.mkdir(parents=True, exist_ok=True)

Accessing Experiment Parameters

from config import CONFIG

# Generate all parameter combinations for experiments
for memory in CONFIG.experiment_memory_limits_mb:
    for compute in CONFIG.experiment_compute_budgets:
        for speed in CONFIG.experiment_stream_speeds_ms:
            print(f"Testing: {memory}MB, {compute} ops, {speed}ms interval")

Build docs developers (and LLMs) love