Skip to main content
This guide explains how to configure your credit score prediction models using YAML files. The configuration system allows you to experiment with different architectures without modifying code.

Configuration File Structure

All model configurations are stored in config/models-configs/ as YAML files. Here’s the complete structure:
hidden_layers:
  - 128
  - 64
  - 32
activation_functions:
  - relu
  - relu
  - relu
dropout_rate: 0.3
learning_rate: 0.0005
epochs: 150
batch_size: 64
Reference: config/models-configs/model_config_001.yaml

Configuration Parameters

Hidden Layers

Defines the architecture of the neural network by specifying the number of neurons in each hidden layer.
hidden_layers:
  - 128  # First hidden layer with 128 neurons
  - 64   # Second hidden layer with 64 neurons
  - 32   # Third hidden layer with 32 neurons
The number of hidden layers and their sizes directly impact model capacity and training time.
Common Architectures:
hidden_layers:
  - 64
  - 32
  • Fast training
  • Lower memory usage
  • Good for small datasets

Activation Functions

Specifies the non-linear activation function for each hidden layer. The list must match the length of hidden_layers.
activation_functions:
  - relu        # For first hidden layer
  - relu        # For second hidden layer
  - relu        # For third hidden layer
Supported Activation Functions:

ReLU

- relu
  • Best for: Most general cases
  • Pros: Fast, effective, no vanishing gradient
  • Cons: Can cause “dying ReLU” problem

Leaky ReLU

- leaky_relu
  • Best for: When ReLU causes dying neurons
  • Pros: Solves dying ReLU, allows small negative values
  • Cons: Slightly more computation

GELU

- gelu
  • Best for: Modern architectures (Transformers)
  • Pros: Smooth, probabilistic approach
  • Cons: More computationally expensive

Tanh

- tanh
  • Best for: Normalized outputs between -1 and 1
  • Pros: Zero-centered outputs
  • Cons: Vanishing gradient for large values

Sigmoid

- sigmoid
  • Best for: Binary outputs (not hidden layers)
  • Pros: Outputs between 0 and 1
  • Cons: Severe vanishing gradient

Softmax

- softmax
  • Best for: Multi-class classification outputs
  • Pros: Probability distribution
  • Cons: Not recommended for hidden layers
Reference: model/model.py:37-54
The number of activation functions must match the number of hidden layers, or training will fail with a validation error.

Dropout Rate

Regularization technique that randomly drops neurons during training to prevent overfitting.
dropout_rate: 0.3  # Drop 30% of neurons during training
Recommended Values:
  • 0.2 - 0.3: Light regularization, good for larger datasets
  • 0.4 - 0.5: Moderate regularization, standard for most cases
  • 0.5 - 0.7: Heavy regularization, for small datasets or complex models
If your model overfits (high training accuracy, low test accuracy), increase the dropout rate.

Learning Rate

Controls how much the model weights are updated during training. One of the most critical hyperparameters.
learning_rate: 0.0005  # Conservative learning rate
Guidelines:
Learning RateUse CaseTraining SpeedStability
0.0001Fine-tuningSlowVery stable
0.0005General purposeModerateStable
0.001Quick experimentsFastMay oscillate
0.01Not recommendedVery fastOften unstable
Too Low (< 0.0001):
  • Training is very slow
  • May get stuck in local minima
  • Requires many epochs to converge
Optimal (0.0001 - 0.001):
  • Steady decrease in loss
  • Good convergence
  • Balanced training speed
Too High (> 0.01):
  • Loss oscillates or diverges
  • Model doesn’t learn
  • Training becomes unstable
Reference: training/training.py:125 (AdamW optimizer)

Epochs

Number of complete passes through the training dataset.
epochs: 150  # Train for 150 complete passes
Considerations:
  • Too Few (< 50): Model may underfit, not learning patterns fully
  • Optimal (100-200): Allows proper convergence without excessive time
  • Too Many (> 300): Risk of overfitting, longer training time
Monitor the training loss in MLflow. If it plateaus early, you can stop training and reduce epochs for future runs.

Batch Size

Number of samples processed before updating model weights.
batch_size: 64  # Process 64 samples per batch
Impact:
Advantages:
  • Lower memory usage
  • More frequent weight updates
  • Better for small datasets
Disadvantages:
  • Noisier gradient estimates
  • Slower training (more iterations)
Reference: training/training.py:102-104

Example Configurations

Configuration 001: Balanced Model

hidden_layers:
  - 128
  - 64
  - 32
activation_functions:
  - relu
  - relu
  - relu
dropout_rate: 0.3
learning_rate: 0.0005
epochs: 150
batch_size: 64
Use Case: General purpose, good starting point for most datasets. Reference: config/models-configs/model_config_001.yaml

Configuration 002: Deep Model with High Regularization

hidden_layers:
  - 256
  - 128
  - 64
  - 32
activation_functions:
  - leaky_relu
  - leaky_relu
  - leaky_relu
  - leaky_relu
dropout_rate: 0.4
learning_rate: 0.0001
epochs: 200
batch_size: 128
Use Case: Complex patterns, when you need more model capacity and have sufficient training data. Reference: config/models-configs/model_config_002.yaml

Creating Custom Configurations

1

Create New YAML File

Navigate to the configuration directory and create a new file:
cd config/models-configs
touch model_config_003.yaml
2

Define Architecture

Edit the file with your desired parameters:
hidden_layers:
  - 256
  - 128
  - 64
activation_functions:
  - gelu
  - gelu
  - gelu
dropout_rate: 0.35
learning_rate: 0.0003
epochs: 175
batch_size: 96
3

Validate Configuration

Ensure:
  • Length of activation_functions matches hidden_layers
  • All values are positive numbers
  • Dropout rate is between 0 and 1
  • Learning rate is reasonable (typically 0.0001 - 0.001)
4

Train Model

Run training with your new configuration:
uv run training/training.py --config config/models-configs/model_config_003.yaml

Model Architecture Details

The configuration is translated into a PyTorch neural network with these components:
for hidden_dim, act_fn in zip(config.hidden_layers, config.activation_functions):
    layers.append(nn.Linear(in_dim, hidden_dim))      # Fully connected layer
    layers.append(nn.BatchNorm1d(hidden_dim))         # Batch normalization
    layers.append(get_activation(act_fn))             # Activation function
    layers.append(nn.Dropout(config.dropout_rate))    # Dropout regularization
Reference: model/model.py:72-84 Each layer includes:
  1. Linear Transformation: Matrix multiplication with learned weights
  2. Batch Normalization: Normalizes activations (mean=0, std=1)
  3. Activation Function: Introduces non-linearity
  4. Dropout: Random neuron deactivation during training
The model automatically uses optimal weight initialization based on the activation function (He initialization for ReLU variants, Xavier for others).

Configuration Best Practices

Begin with a small model (2 layers, 64-32 neurons) and increase complexity only if needed. Simple models:
  • Train faster
  • Are easier to debug
  • Often generalize better
  • Small dataset (< 1,000 samples): Use 1-2 hidden layers
  • Medium dataset (1,000 - 10,000 samples): Use 2-3 hidden layers
  • Large dataset (> 10,000 samples): Use 3-4 hidden layers
For most cases, use the same activation function across all layers. ReLU or Leaky ReLU are safe defaults.
Use sequential naming:
  • model_config_001.yaml: Baseline
  • model_config_002.yaml: Increased capacity
  • model_config_003.yaml: Different activation functions
This makes it easy to track experiments in MLflow.

Hyperparameter Tuning Tips

1

Establish Baseline

Start with model_config_001.yaml and record its performance in MLflow.
2

Tune One Parameter at a Time

Change only one parameter between experiments:
  • First: Learning rate
  • Second: Architecture (layers/neurons)
  • Third: Regularization (dropout)
  • Fourth: Training duration (epochs)
3

Compare in MLflow

Use the MLflow UI to compare runs side-by-side and identify improvements.
4

Iterate

Keep the best configuration and continue tuning other parameters.

Troubleshooting

Error: ValueError: La longitud de las capas ocultas debe ser igual a la longitud de las funciones de activaciónSolution: Ensure your configuration has the same number of items in hidden_layers and activation_functions:
# ✅ Correct
hidden_layers: [128, 64, 32]
activation_functions: [relu, relu, relu]

# ❌ Incorrect
hidden_layers: [128, 64, 32]
activation_functions: [relu, relu]  # Missing one!
Try:
  • Increase learning rate (e.g., 0.0005 → 0.001)
  • Increase batch size (e.g., 64 → 128)
  • Reduce model size (fewer layers or neurons)
  • Reduce number of epochs
Check:
  • Learning rate might be too high (try 0.0001)
  • Model might be too simple (add layers)
  • Data preprocessing (verify in MLflow)

Next Steps

Train Models

Use your configuration to train a model

MLflow Tracking

Compare different configurations in MLflow

Running Inference

Deploy your best model for predictions

API Reference

Explore the complete API documentation

Build docs developers (and LLMs) love