Prerequisites
Before training models, ensure you have:- Python 3.10+ installed
- UV package manager (recommended) or pip
- MLflow running for experiment tracking
- Access to the training dataset
Setup Environment
Install Dependencies
Install all required packages using UV or pip:
UV provides faster, deterministic installations compared to traditional pip.
Training Process
Basic Training Command
The training script uses YAML configuration files to define model architecture and hyperparameters:See the Model Configuration guide for details on creating custom configurations.
Training Script Workflow
The training process (training/training.py:51-237) follows these steps:
Load Configuration
The script reads hyperparameters from the specified YAML file:Reference:
training/training.py:40-48Initialize MLflow Experiment
All training runs are tracked under the “Credit Score Training” experiment:Reference:
training/training.py:65-70Load and Preprocess Data
The training data is loaded and preprocessed automatically:The preprocessor is saved to
processing/preprocessor.joblib for use during inference.Reference: training/training.py:85-93Create PyTorch DataLoaders
Data is converted to PyTorch tensors and loaded in batches:Reference:
training/training.py:96-104Initialize Model
The neural network is built based on configuration parameters:Reference:
training/training.py:107-119Training Loop
The model is trained using Binary Cross-Entropy Loss and AdamW optimizer:Metrics are logged to MLflow at each epoch:Reference:
training/training.py:122-158Evaluation
The model is evaluated on the test set:Multiple metrics are computed and logged:
- Accuracy: Overall classification accuracy
- ROC AUC: Area under the ROC curve
- Precision: Positive predictive value
- Recall: Sensitivity
- F1 Score: Harmonic mean of precision and recall
training/training.py:161-183Training Multiple Configurations
You can train multiple models with different hyperparameters in parallel or sequentially:Monitoring Training Progress
Once training starts, you can monitor it through MLflow:- Navigate to
http://127.0.0.1:5000 - Select the “Credit Score Training” experiment
- View real-time metrics:
- Training loss curve
- Training accuracy progression
- Test metrics upon completion
- Compare different runs side-by-side
- Download artifacts (model weights, visualizations)
Understanding Training Output
During training, you’ll see console output like:Troubleshooting
Training loss is not decreasing
Training loss is not decreasing
- Check learning rate: Try reducing it (e.g., from 0.001 to 0.0001)
- Verify data preprocessing: Ensure features are properly normalized
- Increase model capacity: Add more hidden layers or neurons
- Check for NaN values: Look at MLflow metrics for anomalies
Model overfits training data
Model overfits training data
- Increase dropout rate: Try 0.4 or 0.5 instead of 0.3
- Add regularization: Consider L2 regularization in optimizer
- Reduce model complexity: Use fewer layers or neurons
- Get more training data: If possible, expand the dataset
MLflow connection errors
MLflow connection errors
- Verify MLflow is running: Check
http://127.0.0.1:5000 - Check port availability: Ensure port 5000 is not in use
- Restart MLflow: Stop and restart the MLflow UI
Next Steps
Model Configuration
Learn how to customize model architecture and hyperparameters
Running Inference
Use your trained model to make predictions
MLflow Tracking
Deep dive into experiment tracking and visualization
Deployment
Deploy your model to production with Docker
