Overview
Training configurations are defined in YAML files that specify model architecture, hyperparameters, and training settings. These configurations enable reproducible experiments and easy hyperparameter tuning.Configuration Structure
Complete Example
Configuration Parameters
Network Architecture
List of integers defining the number of neurons in each hidden layer.Constraints:
- Length must match
activation_functionslength - Each value must be positive
- Typically decreasing sequence (e.g., [128, 64, 32])
List of activation function names for each hidden layer.Allowed Values:
relu- Rectified Linear Unit (recommended for most cases)leaky_relu- Leaky ReLU with negative_slope=0.1gelu- Gaussian Error Linear Unitsigmoid- Sigmoid functiontanh- Hyperbolic tangentsoftmax- Softmax (use for multi-class intermediate layers)
- Length must match
hidden_layerslength - One function per hidden layer
The number of hidden layers and activation functions must be equal. A validation error will be raised if they don’t match.
Regularization
Dropout probability applied after each hidden layer to prevent overfitting.Range: 0.0 to 1.0Recommendations:
- Small models: 0.2 - 0.3
- Medium models: 0.3 - 0.5
- Large models: 0.5 - 0.7
Dropout is only active during training. It’s automatically disabled during evaluation and inference.
Optimization
Learning rate for the AdamW optimizer.Typical Range: 0.0001 to 0.01Recommendations:
- Start with: 0.001 (default)
- Large datasets: 0.0005 - 0.001
- Small datasets: 0.001 - 0.005
The training pipeline uses the AdamW optimizer, which includes weight decay regularization for better generalization.
Training Duration
Number of complete passes through the training dataset.Typical Range: 50 to 500Recommendations:
- Quick experiments: 50-100
- Standard training: 100-200
- Full training: 150-300
Batch Processing
Number of samples processed before updating model weights.Typical Range: 16 to 512Recommendations:
- Limited memory: 16-32
- Standard: 32-64
- Large memory: 64-128
- Very large datasets: 128-512
- Smaller batches: More updates, noisier gradients, better generalization
- Larger batches: Fewer updates, smoother gradients, faster training
ModelConfig Dataclass
The YAML configuration is parsed and mapped to theModelConfig dataclass:
model/model.py:20
Auto-Computed Fields
Number of input features - automatically determined from preprocessed data.Computed as:
X_train.shape[1]Number of output neurons - fixed at 1 for binary classification.
Directory for saving model checkpoints (not currently used in training loop).
Configuration Examples
Small Model (Fast Training)
Medium Model (Balanced)
Large Model (Maximum Capacity)
Alternative Activation Functions
Usage in Training
Configurations are loaded and applied during training initialization:training/training.py:108
Best Practices
Naming Convention
- Easy version tracking
- Sequential experiment numbering
- Automatic weight file naming (e.g.,
model_weights_001.pth)
Hyperparameter Tuning Strategy
-
Start with baseline configuration
-
Tune learning rate first
- Try: [0.0001, 0.0005, 0.001, 0.005]
- Monitor training loss convergence
-
Adjust network depth
- Add/remove layers
- Ensure gradual size reduction
-
Optimize regularization
- Increase dropout if overfitting
- Decrease dropout if underfitting
-
Fine-tune batch size
- Balance speed vs. stability
- Consider GPU memory constraints
Validation
The training pipeline validates configuration parameters:model/model.py:66
Invalid configurations will raise errors during model initialization, before training begins.
MLflow Parameter Logging
All configuration parameters are automatically logged to MLflow:- Experiment comparison
- Hyperparameter analysis
- Reproducible training runs
- Configuration versioning
