Configuration File Structure
All model configurations are stored inconfig/models-configs/ as YAML files. Here’s the complete structure:
config/models-configs/model_config_001.yaml
Configuration Parameters
Hidden Layers
Defines the architecture of the neural network by specifying the number of neurons in each hidden layer.The number of hidden layers and their sizes directly impact model capacity and training time.
- Small Model
- Medium Model
- Large Model
- Fast training
- Lower memory usage
- Good for small datasets
Activation Functions
Specifies the non-linear activation function for each hidden layer. The list must match the length ofhidden_layers.
ReLU
- Best for: Most general cases
- Pros: Fast, effective, no vanishing gradient
- Cons: Can cause “dying ReLU” problem
Leaky ReLU
- Best for: When ReLU causes dying neurons
- Pros: Solves dying ReLU, allows small negative values
- Cons: Slightly more computation
GELU
- Best for: Modern architectures (Transformers)
- Pros: Smooth, probabilistic approach
- Cons: More computationally expensive
Tanh
- Best for: Normalized outputs between -1 and 1
- Pros: Zero-centered outputs
- Cons: Vanishing gradient for large values
Sigmoid
- Best for: Binary outputs (not hidden layers)
- Pros: Outputs between 0 and 1
- Cons: Severe vanishing gradient
Softmax
- Best for: Multi-class classification outputs
- Pros: Probability distribution
- Cons: Not recommended for hidden layers
model/model.py:37-54
Dropout Rate
Regularization technique that randomly drops neurons during training to prevent overfitting.- 0.2 - 0.3: Light regularization, good for larger datasets
- 0.4 - 0.5: Moderate regularization, standard for most cases
- 0.5 - 0.7: Heavy regularization, for small datasets or complex models
Learning Rate
Controls how much the model weights are updated during training. One of the most critical hyperparameters.| Learning Rate | Use Case | Training Speed | Stability |
|---|---|---|---|
| 0.0001 | Fine-tuning | Slow | Very stable |
| 0.0005 | General purpose | Moderate | Stable |
| 0.001 | Quick experiments | Fast | May oscillate |
| 0.01 | Not recommended | Very fast | Often unstable |
Choosing the right learning rate
Choosing the right learning rate
Too Low (< 0.0001):
- Training is very slow
- May get stuck in local minima
- Requires many epochs to converge
- Steady decrease in loss
- Good convergence
- Balanced training speed
- Loss oscillates or diverges
- Model doesn’t learn
- Training becomes unstable
training/training.py:125 (AdamW optimizer)
Epochs
Number of complete passes through the training dataset.- Too Few (< 50): Model may underfit, not learning patterns fully
- Optimal (100-200): Allows proper convergence without excessive time
- Too Many (> 300): Risk of overfitting, longer training time
Monitor the training loss in MLflow. If it plateaus early, you can stop training and reduce epochs for future runs.
Batch Size
Number of samples processed before updating model weights.- Small (16-32)
- Medium (64-128)
- Large (256+)
Advantages:
- Lower memory usage
- More frequent weight updates
- Better for small datasets
- Noisier gradient estimates
- Slower training (more iterations)
training/training.py:102-104
Example Configurations
Configuration 001: Balanced Model
config/models-configs/model_config_001.yaml
Configuration 002: Deep Model with High Regularization
config/models-configs/model_config_002.yaml
Creating Custom Configurations
Validate Configuration
Ensure:
- Length of
activation_functionsmatcheshidden_layers - All values are positive numbers
- Dropout rate is between 0 and 1
- Learning rate is reasonable (typically 0.0001 - 0.001)
Model Architecture Details
The configuration is translated into a PyTorch neural network with these components:model/model.py:72-84
Each layer includes:
- Linear Transformation: Matrix multiplication with learned weights
- Batch Normalization: Normalizes activations (mean=0, std=1)
- Activation Function: Introduces non-linearity
- Dropout: Random neuron deactivation during training
Configuration Best Practices
Start Simple, Then Scale
Start Simple, Then Scale
Begin with a small model (2 layers, 64-32 neurons) and increase complexity only if needed. Simple models:
- Train faster
- Are easier to debug
- Often generalize better
Match Architecture to Data Size
Match Architecture to Data Size
- Small dataset (< 1,000 samples): Use 1-2 hidden layers
- Medium dataset (1,000 - 10,000 samples): Use 2-3 hidden layers
- Large dataset (> 10,000 samples): Use 3-4 hidden layers
Use Consistent Activation Functions
Use Consistent Activation Functions
For most cases, use the same activation function across all layers. ReLU or Leaky ReLU are safe defaults.
Version Your Configurations
Version Your Configurations
Use sequential naming:
model_config_001.yaml: Baselinemodel_config_002.yaml: Increased capacitymodel_config_003.yaml: Different activation functions
Hyperparameter Tuning Tips
Tune One Parameter at a Time
Change only one parameter between experiments:
- First: Learning rate
- Second: Architecture (layers/neurons)
- Third: Regularization (dropout)
- Fourth: Training duration (epochs)
Troubleshooting
Validation error: Length mismatch
Validation error: Length mismatch
Error:
ValueError: La longitud de las capas ocultas debe ser igual a la longitud de las funciones de activaciónSolution: Ensure your configuration has the same number of items in hidden_layers and activation_functions:Model trains too slowly
Model trains too slowly
Try:
- Increase learning rate (e.g., 0.0005 → 0.001)
- Increase batch size (e.g., 64 → 128)
- Reduce model size (fewer layers or neurons)
- Reduce number of epochs
Model doesn't improve
Model doesn't improve
Check:
- Learning rate might be too high (try 0.0001)
- Model might be too simple (add layers)
- Data preprocessing (verify in MLflow)
Next Steps
Train Models
Use your configuration to train a model
MLflow Tracking
Compare different configurations in MLflow
Running Inference
Deploy your best model for predictions
API Reference
Explore the complete API documentation
