Skip to main content

DatasetSpec

Dataclass defining the static properties of a dataset.
name
str
required
Human-readable dataset name
version
str
required
Dataset version identifier
train_path
str
required
Relative or absolute path to training CSV file
test_path
str
required
Relative or absolute path to test CSV file
expected_features
int
default:"784"
Number of feature columns (excluding label)
expected_min_rows
int
default:"100"
Minimum rows required for validation
download_base_url
str
default:"https://pjreddie.com/media/files"
Base URL for downloading dataset files

FASHION_MNIST_SPEC

Pre-configured DatasetSpec for Fashion-MNIST dataset.
  • name: “fashion-mnist”
  • version: “v1”
  • train_path: “Neural Network from Scratch/task/Data/fashion-mnist_train.csv”
  • test_path: “Neural Network from Scratch/task/Data/fashion-mnist_test.csv”

file_digest()

Compute SHA256 hash digest for a dataset file.
file_digest(path: str | Path) -> str
path
str | Path
required
Path to the file
Returns: SHA256 hex digest string Raises: FileNotFoundError if file does not exist Source: dataset_config.py:43

validate_dataset_file()

Validate dataset integrity at both file and tensor levels.
validate_dataset_file(
    path: str | Path,
    expected_features: int,
    expected_min_rows: int,
    expected_sha256: Optional[str] = None
) -> Tuple[int, int]
path
str | Path
required
Path to dataset CSV file
expected_features
int
required
Expected number of feature columns (excluding label column)
expected_min_rows
int
required
Minimum number of rows required
expected_sha256
Optional[str]
default:"None"
Optional SHA256 hash for file integrity verification
Returns: Tuple of (n_rows, n_cols) for the validated dataset Raises:
  • FileNotFoundError if file does not exist
  • ValueError if file is empty, hash mismatch, shape mismatch, too few rows, contains NaN, or labels out of range [0,9]
Source: dataset_config.py:52

load_dataset()

Load and preprocess a CSV dataset into normalized features and labels.
load_dataset(path: str | Path) -> Tuple[np.ndarray, np.ndarray]
path
str | Path
required
Path to dataset CSV file
Returns: Tuple of (X, y) where:
  • X: Normalized feature matrix (float32) with values scaled by max pixel value
  • y: Integer label vector (int32)
Source: dataset_config.py:95

download_fashion_mnist()

Download Fashion-MNIST train and test CSV files from remote server.
download_fashion_mnist(
    spec: DatasetSpec = FASHION_MNIST_SPEC
) -> Dict[str, str]
spec
DatasetSpec
default:"FASHION_MNIST_SPEC"
Dataset specification with download URLs and target paths
Returns: Dictionary with keys "train_sha256" and "test_sha256" containing computed hashes Raises: requests.HTTPError if download fails Source: dataset_config.py:115

ensure_dataset_ready()

Validate dataset availability and optionally auto-download if missing or invalid.
ensure_dataset_ready(
    spec: DatasetSpec,
    expected_features: int,
    expected_min_rows: int,
    auto_download: bool = False,
    expected_sha256: Optional[str] = None
) -> Tuple[int, int]
spec
DatasetSpec
required
Dataset specification
expected_features
int
required
Expected feature count
expected_min_rows
int
required
Minimum row count
auto_download
bool
default:"false"
If True, automatically download dataset when validation fails
expected_sha256
Optional[str]
default:"None"
Optional SHA256 for integrity check
Returns: Tuple of (n_rows, n_cols) after successful validation Raises:
  • Validation errors if auto_download=False and dataset is invalid
  • RuntimeError if auto-download fails
Source: dataset_config.py:129

Build docs developers (and LLMs) love