dataset_config

DatasetSpec

Dataclass defining the static properties of a dataset.

name

str

required

Human-readable dataset name

version

str

required

Dataset version identifier

train_path

str

required

Relative or absolute path to training CSV file

test_path

str

required

Relative or absolute path to test CSV file

expected_features

int

default:"784"

Number of feature columns (excluding label)

expected_min_rows

int

default:"100"

Minimum rows required for validation

download_base_url

str

default:"https://pjreddie.com/media/files"

Base URL for downloading dataset files

FASHION_MNIST_SPEC

Pre-configured DatasetSpec for Fashion-MNIST dataset.

name: “fashion-mnist”
version: “v1”
train_path: “Neural Network from Scratch/task/Data/fashion-mnist_train.csv”
test_path: “Neural Network from Scratch/task/Data/fashion-mnist_test.csv”

file_digest()

Compute SHA256 hash digest for a dataset file.

file_digest(path: str | Path) -> str

path

str | Path

required

Path to the file

Returns: SHA256 hex digest string Raises: FileNotFoundError if file does not exist Source: dataset_config.py:43

validate_dataset_file()

Validate dataset integrity at both file and tensor levels.

validate_dataset_file(
    path: str | Path,
    expected_features: int,
    expected_min_rows: int,
    expected_sha256: Optional[str] = None
) -> Tuple[int, int]

path

str | Path

required

Path to dataset CSV file

expected_features

int

required

Expected number of feature columns (excluding label column)

expected_min_rows

int

required

Minimum number of rows required

expected_sha256

Optional[str]

default:"None"

Optional SHA256 hash for file integrity verification

Returns: Tuple of (n_rows, n_cols) for the validated dataset Raises:

FileNotFoundError if file does not exist
ValueError if file is empty, hash mismatch, shape mismatch, too few rows, contains NaN, or labels out of range [0,9]

Source: dataset_config.py:52

load_dataset()

Load and preprocess a CSV dataset into normalized features and labels.

load_dataset(path: str | Path) -> Tuple[np.ndarray, np.ndarray]

path

str | Path

required

Path to dataset CSV file

Returns: Tuple of (X, y) where:

X: Normalized feature matrix (float32) with values scaled by max pixel value
y: Integer label vector (int32)

Source: dataset_config.py:95

download_fashion_mnist()

Download Fashion-MNIST train and test CSV files from remote server.

download_fashion_mnist(
    spec: DatasetSpec = FASHION_MNIST_SPEC
) -> Dict[str, str]

spec

DatasetSpec

default:"FASHION_MNIST_SPEC"

Dataset specification with download URLs and target paths

Returns: Dictionary with keys "train_sha256" and "test_sha256" containing computed hashes Raises: requests.HTTPError if download fails Source: dataset_config.py:115

ensure_dataset_ready()

Validate dataset availability and optionally auto-download if missing or invalid.

ensure_dataset_ready(
    spec: DatasetSpec,
    expected_features: int,
    expected_min_rows: int,
    auto_download: bool = False,
    expected_sha256: Optional[str] = None
) -> Tuple[int, int]

spec

DatasetSpec

required

Dataset specification

expected_features

int

required

Expected feature count

expected_min_rows

int

required

Minimum row count

auto_download

bool

default:"false"

If True, automatically download dataset when validation fails

expected_sha256

Optional[str]

default:"None"

Optional SHA256 for integrity check

Returns: Tuple of (n_rows, n_cols) after successful validation Raises:

Validation errors if auto_download=False and dataset is invalid
RuntimeError if auto-download fails

Source: dataset_config.py:129

Core Components

Configuration

Training & Evaluation

Analysis Tools

CLI Scripts

DatasetSpec

FASHION_MNIST_SPEC

file_digest()

validate_dataset_file()

load_dataset()

download_fashion_mnist()

ensure_dataset_ready()

Build docs developers (and LLMs) love

Core Components

Configuration

Training & Evaluation

Analysis Tools

CLI Scripts

​DatasetSpec

​FASHION_MNIST_SPEC

​file_digest()

​validate_dataset_file()

​load_dataset()

​download_fashion_mnist()

​ensure_dataset_ready()

Build docs developers (and LLMs) love

DatasetSpec

FASHION_MNIST_SPEC

file_digest()

validate_dataset_file()

load_dataset()

download_fashion_mnist()

ensure_dataset_ready()