Overview
The config module defines the system-wide configuration using a dataclass that controls data paths, hardware constraints, experiment parameters, and reproducibility settings.
Data Classes
SystemConfig
Central configuration dataclass for the Hospital Data Analysis Platform.
@dataclass
class SystemConfig:
random_seed: int = 42
test_size: float = 0.25
data_dir: Path = Path(__file__).resolve().parent.parent / "test"
output_dir: Path = Path(__file__).resolve().parent / "artifacts"
stream_chunk_size: int = 16
stream_interval_ms: int = 10
hardware_memory_limit_mb: int = 256
hardware_compute_budget: int = 10_000
benchmark_runs: int = 5
confidence_level: float = 0.95
feature_columns: list[str] = field(
default_factory=lambda: ["age", "height", "weight", "bmi", "children", "months"]
)
target_risk: str = "diagnosis"
target_outcome: str = "blood_test"
experiment_memory_limits_mb: list[int] = field(default_factory=lambda: [64, 128, 256])
experiment_compute_budgets: list[int] = field(default_factory=lambda: [2_000, 5_000, 10_000])
experiment_stream_speeds_ms: list[int] = field(default_factory=lambda: [5, 10, 20])
Reproducibility Parameters
Random seed for reproducibility across Python, NumPy, and hashing operations
Fraction of data to use for testing (0-1)
Path Configuration
Directory containing input data files. Defaults to test directory relative to project root
output_dir
Path
default:"./artifacts"
Directory for output artifacts (CSV files, plots, models). Created automatically if it doesn’t exist
Streaming Parameters
Number of records processed per stream chunk
Time interval between stream chunks in milliseconds
Hardware Constraints
Memory limit for hardware profiling in megabytes
Compute budget representing maximum number of operations
Benchmarking Parameters
Number of repeated runs for benchmark experiments
Confidence level for statistical intervals (0-1)
Feature and Target Configuration
List of feature column names to use in analysis
Column name for risk classification target
Column name for outcome prediction target
Experiment Parameter Sweeps
experiment_memory_limits_mb
list[int]
default:"[64, 128, 256]"
Memory limits to test in hardware constraint experiments (in MB)
experiment_compute_budgets
list[int]
default:"[2000, 5000, 10000]"
Compute budgets to test in hardware constraint experiments
experiment_stream_speeds_ms
list[int]
default:"[5, 10, 20]"
Stream interval speeds to test in experiments (in milliseconds)
Global Instance
CONFIG
The module provides a global CONFIG instance that is used throughout the platform:
CONFIG = SystemConfig()
CONFIG.output_dir.mkdir(parents=True, exist_ok=True)
The output directory is automatically created when the module is imported.
Usage Examples
Using the Default Configuration
from config import CONFIG
print(f"Random seed: {CONFIG.random_seed}")
print(f"Output directory: {CONFIG.output_dir}")
print(f"Feature columns: {CONFIG.feature_columns}")
Creating a Custom Configuration
from pathlib import Path
from config import SystemConfig
custom_config = SystemConfig(
random_seed=123,
hardware_memory_limit_mb=512,
benchmark_runs=10,
feature_columns=["age", "weight", "bmi"],
output_dir=Path("./my_artifacts")
)
custom_config.output_dir.mkdir(parents=True, exist_ok=True)
Accessing Experiment Parameters
from config import CONFIG
# Generate all parameter combinations for experiments
for memory in CONFIG.experiment_memory_limits_mb:
for compute in CONFIG.experiment_compute_budgets:
for speed in CONFIG.experiment_stream_speeds_ms:
print(f"Testing: {memory}MB, {compute} ops, {speed}ms interval")