Model Configuration

This guide explains how to configure your credit score prediction models using YAML files. The configuration system allows you to experiment with different architectures without modifying code.

Configuration File Structure

All model configurations are stored in config/models-configs/ as YAML files. Here’s the complete structure:

hidden_layers:
  - 128
  - 64
  - 32
activation_functions:
  - relu
  - relu
  - relu
dropout_rate: 0.3
learning_rate: 0.0005
epochs: 150
batch_size: 64

Reference: config/models-configs/model_config_001.yaml

Configuration Parameters

Hidden Layers

Defines the architecture of the neural network by specifying the number of neurons in each hidden layer.

hidden_layers:
  - 128  # First hidden layer with 128 neurons
  - 64   # Second hidden layer with 64 neurons
  - 32   # Third hidden layer with 32 neurons

The number of hidden layers and their sizes directly impact model capacity and training time.

Common Architectures:

Small Model
Medium Model
Large Model

hidden_layers:
  - 64
  - 32

Fast training
Lower memory usage
Good for small datasets

hidden_layers:
  - 128
  - 64
  - 32

Balanced performance
Recommended starting point
Good generalization

hidden_layers:
  - 256
  - 128
  - 64
  - 32

High capacity
Longer training time
Risk of overfitting

Activation Functions

Specifies the non-linear activation function for each hidden layer. The list must match the length of hidden_layers.

activation_functions:
  - relu        # For first hidden layer
  - relu        # For second hidden layer
  - relu        # For third hidden layer

Supported Activation Functions:

ReLU

- relu

Best for: Most general cases
Pros: Fast, effective, no vanishing gradient
Cons: Can cause “dying ReLU” problem

Leaky ReLU

- leaky_relu

Best for: When ReLU causes dying neurons
Pros: Solves dying ReLU, allows small negative values
Cons: Slightly more computation

GELU

- gelu

Best for: Modern architectures (Transformers)
Pros: Smooth, probabilistic approach
Cons: More computationally expensive

Tanh

- tanh

Best for: Normalized outputs between -1 and 1
Pros: Zero-centered outputs
Cons: Vanishing gradient for large values

Sigmoid

- sigmoid

Best for: Binary outputs (not hidden layers)
Pros: Outputs between 0 and 1
Cons: Severe vanishing gradient

Softmax

- softmax

Best for: Multi-class classification outputs
Pros: Probability distribution
Cons: Not recommended for hidden layers

Reference: model/model.py:37-54

The number of activation functions must match the number of hidden layers, or training will fail with a validation error.

Dropout Rate

Regularization technique that randomly drops neurons during training to prevent overfitting.

dropout_rate: 0.3  # Drop 30% of neurons during training

Recommended Values:

0.2 - 0.3: Light regularization, good for larger datasets
0.4 - 0.5: Moderate regularization, standard for most cases
0.5 - 0.7: Heavy regularization, for small datasets or complex models

If your model overfits (high training accuracy, low test accuracy), increase the dropout rate.

Learning Rate

Controls how much the model weights are updated during training. One of the most critical hyperparameters.

learning_rate: 0.0005  # Conservative learning rate

Guidelines:

Learning Rate	Use Case	Training Speed	Stability
0.0001	Fine-tuning	Slow	Very stable
0.0005	General purpose	Moderate	Stable
0.001	Quick experiments	Fast	May oscillate
0.01	Not recommended	Very fast	Often unstable

Choosing the right learning rate

Too Low (< 0.0001):

Training is very slow
May get stuck in local minima
Requires many epochs to converge

Optimal (0.0001 - 0.001):

Steady decrease in loss
Good convergence
Balanced training speed

Too High (> 0.01):

Loss oscillates or diverges
Model doesn’t learn
Training becomes unstable

Reference: training/training.py:125 (AdamW optimizer)

Epochs

Number of complete passes through the training dataset.

epochs: 150  # Train for 150 complete passes

Considerations:

Too Few (< 50): Model may underfit, not learning patterns fully
Optimal (100-200): Allows proper convergence without excessive time
Too Many (> 300): Risk of overfitting, longer training time

Monitor the training loss in MLflow. If it plateaus early, you can stop training and reduce epochs for future runs.

Batch Size

Number of samples processed before updating model weights.

batch_size: 64  # Process 64 samples per batch

Impact:

Small (16-32)
Medium (64-128)
Large (256+)

Advantages:

Lower memory usage
More frequent weight updates
Better for small datasets

Disadvantages:

Noisier gradient estimates
Slower training (more iterations)

Reference: training/training.py:102-104

Example Configurations

Configuration 001: Balanced Model

hidden_layers:
  - 128
  - 64
  - 32
activation_functions:
  - relu
  - relu
  - relu
dropout_rate: 0.3
learning_rate: 0.0005
epochs: 150
batch_size: 64

Use Case: General purpose, good starting point for most datasets. Reference: config/models-configs/model_config_001.yaml

Configuration 002: Deep Model with High Regularization

hidden_layers:
  - 256
  - 128
  - 64
  - 32
activation_functions:
  - leaky_relu
  - leaky_relu
  - leaky_relu
  - leaky_relu
dropout_rate: 0.4
learning_rate: 0.0001
epochs: 200
batch_size: 128

Use Case: Complex patterns, when you need more model capacity and have sufficient training data. Reference: config/models-configs/model_config_002.yaml

Creating Custom Configurations

Create New YAML File

Navigate to the configuration directory and create a new file:

cd config/models-configs
touch model_config_003.yaml

Define Architecture

Edit the file with your desired parameters:

hidden_layers:
  - 256
  - 128
  - 64
activation_functions:
  - gelu
  - gelu
  - gelu
dropout_rate: 0.35
learning_rate: 0.0003
epochs: 175
batch_size: 96

Validate Configuration

Ensure:

Length of activation_functions matches hidden_layers
All values are positive numbers
Dropout rate is between 0 and 1
Learning rate is reasonable (typically 0.0001 - 0.001)

Train Model

Run training with your new configuration:

uv run training/training.py --config config/models-configs/model_config_003.yaml

Model Architecture Details

The configuration is translated into a PyTorch neural network with these components:

for hidden_dim, act_fn in zip(config.hidden_layers, config.activation_functions):
    layers.append(nn.Linear(in_dim, hidden_dim))      # Fully connected layer
    layers.append(nn.BatchNorm1d(hidden_dim))         # Batch normalization
    layers.append(get_activation(act_fn))             # Activation function
    layers.append(nn.Dropout(config.dropout_rate))    # Dropout regularization

Reference: model/model.py:72-84 Each layer includes:

Linear Transformation: Matrix multiplication with learned weights
Batch Normalization: Normalizes activations (mean=0, std=1)
Activation Function: Introduces non-linearity
Dropout: Random neuron deactivation during training

The model automatically uses optimal weight initialization based on the activation function (He initialization for ReLU variants, Xavier for others).

Configuration Best Practices

Start Simple, Then Scale

Begin with a small model (2 layers, 64-32 neurons) and increase complexity only if needed. Simple models:

Train faster
Are easier to debug
Often generalize better

Match Architecture to Data Size

Small dataset (< 1,000 samples): Use 1-2 hidden layers
Medium dataset (1,000 - 10,000 samples): Use 2-3 hidden layers
Large dataset (> 10,000 samples): Use 3-4 hidden layers

Use Consistent Activation Functions

For most cases, use the same activation function across all layers. ReLU or Leaky ReLU are safe defaults.

Version Your Configurations

Use sequential naming:

model_config_001.yaml: Baseline
model_config_002.yaml: Increased capacity
model_config_003.yaml: Different activation functions

This makes it easy to track experiments in MLflow.

Hyperparameter Tuning Tips

Establish Baseline

Start with model_config_001.yaml and record its performance in MLflow.

Tune One Parameter at a Time

Change only one parameter between experiments:

First: Learning rate
Second: Architecture (layers/neurons)
Third: Regularization (dropout)
Fourth: Training duration (epochs)

Compare in MLflow

Use the MLflow UI to compare runs side-by-side and identify improvements.

Iterate

Keep the best configuration and continue tuning other parameters.

Troubleshooting

Validation error: Length mismatch

Error: ValueError: La longitud de las capas ocultas debe ser igual a la longitud de las funciones de activaciónSolution: Ensure your configuration has the same number of items in hidden_layers and activation_functions:

# ✅ Correct
hidden_layers: [128, 64, 32]
activation_functions: [relu, relu, relu]

# ❌ Incorrect
hidden_layers: [128, 64, 32]
activation_functions: [relu, relu]  # Missing one!

Model trains too slowly

Try:

Increase learning rate (e.g., 0.0005 → 0.001)
Increase batch size (e.g., 64 → 128)
Reduce model size (fewer layers or neurons)
Reduce number of epochs

Model doesn't improve

Check:

Learning rate might be too high (try 0.0001)
Model might be too simple (add layers)
Data preprocessing (verify in MLflow)

Next Steps

Train Models

Use your configuration to train a model

MLflow Tracking

Compare different configurations in MLflow

Running Inference

Deploy your best model for predictions

API Reference

Explore the complete API documentation

Get Started

Core Concepts

Guides

Use Cases

Configuration File Structure

Configuration Parameters

Hidden Layers

Activation Functions

ReLU

Leaky ReLU

GELU

Tanh

Sigmoid

Softmax

Dropout Rate

Learning Rate

Epochs

Batch Size

Example Configurations

Configuration 001: Balanced Model

Configuration 002: Deep Model with High Regularization

Creating Custom Configurations

Model Architecture Details

Configuration Best Practices

Hyperparameter Tuning Tips

Troubleshooting

Next Steps

Train Models

MLflow Tracking

Running Inference

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Use Cases

​Configuration File Structure

​Configuration Parameters

​Hidden Layers

​Activation Functions

ReLU

Leaky ReLU

GELU

Tanh

Sigmoid

Softmax

​Dropout Rate

​Learning Rate

​Epochs

​Batch Size

​Example Configurations

​Configuration 001: Balanced Model

​Configuration 002: Deep Model with High Regularization

​Creating Custom Configurations

​Model Architecture Details

​Configuration Best Practices

​Hyperparameter Tuning Tips

​Troubleshooting

​Next Steps

Train Models

MLflow Tracking

Running Inference

API Reference

Build docs developers (and LLMs) love

Configuration File Structure

Configuration Parameters

Hidden Layers

Activation Functions

Dropout Rate

Learning Rate

Epochs

Batch Size

Example Configurations

Configuration 001: Balanced Model

Configuration 002: Deep Model with High Regularization

Creating Custom Configurations

Model Architecture Details

Configuration Best Practices

Hyperparameter Tuning Tips

Troubleshooting

Next Steps