Overview
This module provides the configuration infrastructure for the Credit Score Model, including theModelConfig dataclass for defining model architecture and hyperparameters, and the get_activation() factory function for creating activation layers.
ModelConfig
Dataclass that encapsulates all model architecture and training parameters.Parameters
Required Parameters
Number of input features. Must match the dimensionality of preprocessed input data.Example: For a dataset with 20 features after preprocessing, set
input_size=20List of integers defining the number of neurons in each hidden layer.Important: Length must match the length of
activation_functions.Example: [128, 64, 32] creates a 3-layer architecture with decreasing widthList of activation function names, one per hidden layer. Supported values:
"relu": Rectified Linear Unit"leaky_relu": Leaky ReLU (negative_slope=0.1)"gelu": Gaussian Error Linear Unit"sigmoid": Sigmoid activation"softmax": Softmax (dim=1)"tanh": Hyperbolic tangent
hidden_layers.Optional Parameters
Number of output neurons. For binary classification, use
1 (outputs logits before sigmoid).Dropout probability applied after each hidden layer activation. Range: [0.0, 1.0]Higher values provide stronger regularization but may underfit.
Learning rate for the optimizer during training.
Number of training epochs.
Number of samples per training batch.
Directory path for saving model checkpoints during training.
Usage Examples
get_activation()
Factory function that returns PyTorch activation modules based on string identifiers.Parameters
Name of the activation function. Must be one of:
"relu": Returnsnn.ReLU()"leaky_relu": Returnsnn.LeakyReLU(negative_slope=0.1)"gelu": Returnsnn.GELU()"sigmoid": Returnsnn.Sigmoid()"softmax": Returnsnn.Softmax(dim=1)"tanh": Returnsnn.Tanh()
Returns
PyTorch activation module instance
Raises
Raised when
function_name is not recognized. Error is also logged via the module logger.Usage Example
Activation Function Details
ReLU (Rectified Linear Unit)
- Formula:
f(x) = max(0, x) - Best for: Deep networks, general-purpose hidden layers
- Weight initialization: He/Kaiming
Leaky ReLU
- Formula:
f(x) = x if x > 0 else 0.1 * x - Best for: Networks with dying ReLU problems
- Weight initialization: He/Kaiming
GELU (Gaussian Error Linear Unit)
- Best for: Transformer-style architectures, smooth gradients
- Weight initialization: He/Kaiming (using ReLU variant)
Sigmoid
- Formula:
f(x) = 1 / (1 + exp(-x)) - Best for: Binary classification output, gates in LSTM-style networks
- Weight initialization: Xavier/Glorot
- Note: Susceptible to vanishing gradients in deep networks
Softmax
- Formula:
f(x_i) = exp(x_i) / sum(exp(x_j)) - Best for: Multi-class classification output layer
- Weight initialization: Xavier/Glorot
- Note: Applied across dimension 1 (batch dimension)
Tanh (Hyperbolic Tangent)
- Formula:
f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)) - Best for: Hidden layers in RNNs, zero-centered outputs
- Weight initialization: Xavier/Glorot
Configuration Validation
The
CreditScoreModel constructor validates that len(hidden_layers) == len(activation_functions). Mismatched lengths will raise a ValueError at model initialization.Example Error
Best Practices
- Start simple: Begin with 2-3 hidden layers and ReLU activations
- Layer size: Typically decrease layer sizes progressively (e.g., 128 → 64 → 32)
- Dropout: Start with 0.2-0.3 for moderate regularization
- Learning rate: Use 0.001 as baseline, decrease if training is unstable
- Batch size: Larger batches (64-128) provide more stable gradients
- Activation mixing: ReLU for most layers, consider Tanh for final hidden layer
Source Reference
Implementation:python-projects/credit-score/model/model.py:19-55