Skip to main content

Overview

This module provides the configuration infrastructure for the Credit Score Model, including the ModelConfig dataclass for defining model architecture and hyperparameters, and the get_activation() factory function for creating activation layers.

ModelConfig

Dataclass that encapsulates all model architecture and training parameters.
@dataclass
class ModelConfig:
    input_size: int
    hidden_layers: list[int]
    activation_functions: list[Literal["relu", "leaky_relu", "gelu", "sigmoid", "softmax", "tanh"]]
    output_size: int = 1
    dropout_rate: float = 0.2
    learning_rate: float = 0.001
    epochs: int = 100
    batch_size: int = 32
    checkpoint_path: str = "./model/checkpoint"

Parameters

Required Parameters

input_size
int
required
Number of input features. Must match the dimensionality of preprocessed input data.Example: For a dataset with 20 features after preprocessing, set input_size=20
hidden_layers
list[int]
required
List of integers defining the number of neurons in each hidden layer.Important: Length must match the length of activation_functions.Example: [128, 64, 32] creates a 3-layer architecture with decreasing width
activation_functions
list[str]
required
List of activation function names, one per hidden layer. Supported values:
  • "relu": Rectified Linear Unit
  • "leaky_relu": Leaky ReLU (negative_slope=0.1)
  • "gelu": Gaussian Error Linear Unit
  • "sigmoid": Sigmoid activation
  • "softmax": Softmax (dim=1)
  • "tanh": Hyperbolic tangent
Important: Length must match the length of hidden_layers.

Optional Parameters

output_size
int
default:"1"
Number of output neurons. For binary classification, use 1 (outputs logits before sigmoid).
dropout_rate
float
default:"0.2"
Dropout probability applied after each hidden layer activation. Range: [0.0, 1.0]Higher values provide stronger regularization but may underfit.
learning_rate
float
default:"0.001"
Learning rate for the optimizer during training.
epochs
int
default:"100"
Number of training epochs.
batch_size
int
default:"32"
Number of samples per training batch.
checkpoint_path
str
default:"./model/checkpoint"
Directory path for saving model checkpoints during training.

Usage Examples

from model.model import ModelConfig

# Simple 2-layer network
config = ModelConfig(
    input_size=20,
    hidden_layers=[64, 32],
    activation_functions=["relu", "relu"]
)

get_activation()

Factory function that returns PyTorch activation modules based on string identifiers.
def get_activation(function_name: str) -> nn.Module

Parameters

function_name
str
required
Name of the activation function. Must be one of:
  • "relu": Returns nn.ReLU()
  • "leaky_relu": Returns nn.LeakyReLU(negative_slope=0.1)
  • "gelu": Returns nn.GELU()
  • "sigmoid": Returns nn.Sigmoid()
  • "softmax": Returns nn.Softmax(dim=1)
  • "tanh": Returns nn.Tanh()

Returns

activation
nn.Module
PyTorch activation module instance

Raises

ValueError
Exception
Raised when function_name is not recognized. Error is also logged via the module logger.

Usage Example

import torch.nn as nn
from model.model import get_activation

# Create activation layers
relu = get_activation("relu")
leaky_relu = get_activation("leaky_relu")
gelu = get_activation("gelu")

# Use in custom model
class CustomLayer(nn.Module):
    def __init__(self, in_features, out_features, activation="relu"):
        super().__init__()
        self.linear = nn.Linear(in_features, out_features)
        self.activation = get_activation(activation)
    
    def forward(self, x):
        return self.activation(self.linear(x))

Activation Function Details

ReLU (Rectified Linear Unit)

  • Formula: f(x) = max(0, x)
  • Best for: Deep networks, general-purpose hidden layers
  • Weight initialization: He/Kaiming

Leaky ReLU

  • Formula: f(x) = x if x > 0 else 0.1 * x
  • Best for: Networks with dying ReLU problems
  • Weight initialization: He/Kaiming

GELU (Gaussian Error Linear Unit)

  • Best for: Transformer-style architectures, smooth gradients
  • Weight initialization: He/Kaiming (using ReLU variant)

Sigmoid

  • Formula: f(x) = 1 / (1 + exp(-x))
  • Best for: Binary classification output, gates in LSTM-style networks
  • Weight initialization: Xavier/Glorot
  • Note: Susceptible to vanishing gradients in deep networks

Softmax

  • Formula: f(x_i) = exp(x_i) / sum(exp(x_j))
  • Best for: Multi-class classification output layer
  • Weight initialization: Xavier/Glorot
  • Note: Applied across dimension 1 (batch dimension)

Tanh (Hyperbolic Tangent)

  • Formula: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
  • Best for: Hidden layers in RNNs, zero-centered outputs
  • Weight initialization: Xavier/Glorot

Configuration Validation

The CreditScoreModel constructor validates that len(hidden_layers) == len(activation_functions). Mismatched lengths will raise a ValueError at model initialization.

Example Error

# This will raise ValueError
config = ModelConfig(
    input_size=20,
    hidden_layers=[128, 64, 32],
    activation_functions=["relu", "relu"]  # Only 2 functions for 3 layers!
)
model = CreditScoreModel(config)  # ValueError: La longitud de las capas ocultas debe ser igual...

Best Practices

  1. Start simple: Begin with 2-3 hidden layers and ReLU activations
  2. Layer size: Typically decrease layer sizes progressively (e.g., 128 → 64 → 32)
  3. Dropout: Start with 0.2-0.3 for moderate regularization
  4. Learning rate: Use 0.001 as baseline, decrease if training is unstable
  5. Batch size: Larger batches (64-128) provide more stable gradients
  6. Activation mixing: ReLU for most layers, consider Tanh for final hidden layer

Source Reference

Implementation: python-projects/credit-score/model/model.py:19-55

Build docs developers (and LLMs) love