Hardware
GPU
NVIDIA A100 MIG partition — 9.8 GB VRAM
RAM
251 GB system memory
CPU
Intel Xeon Gold
Model Configuration
All four ensemble checkpoints share identical architecture hyperparameters:Architecture Summary
TheSuperEnhancedTemporalModel is a three-stage sequence model:
- Input projection — Linear → LayerNorm → ReLU → Dropout(0.2). Projects 1280-dim CNN features to 768-dim hidden space.
- 4-layer bidirectional LSTM — processes the projected sequence, yielding a 1536-dim output (768 × 2 directions).
- Multi-head self-attention — 12 heads over the LSTM output with residual connection and LayerNorm.
- Attention pooling — learned scalar weights over the sequence, producing a single 1536-dim vector.
- Classifier head — four Linear layers (1536 → 768 → 512 → 256 → 4) with LayerNorm and Dropout between each.
Setting Up the Trainer
Creating Dataloaders
create_dataloaders() does the following:
- Loads
train_features_multiscale.h5,val_features_multiscale.h5, andtest_features_multiscale.h5. - Constructs a
WeightedRandomSamplerfor the training split based on inverse class frequency. - Applies training-time augmentation (temporal subsampling, shift, and noise) only to the training dataset. See Optimization for augmentation details.
- Uses a custom
collate_featuresfunction that zero-pads variable-length sequences within a batch.
Training a Single Model
Training Loop
Forward pass
Features and packed sequence lengths are fed to the model. Lengths allow the LSTM to ignore padding via
pack_padded_sequence.Loss computation
FocalLoss(gamma=2.0, smoothing=0.1) is evaluated against integer labels. Per-class alpha weights from the training set are passed at construction time.Validation
Full validation is run after each epoch. Accuracy, weighted F1, and per-class F1 are computed.
Ensemble Training
Four independent models are trained with different random seeds to produce the final ensemble:42 + i) to encourage diversity:
Ensemble Validation Accuracy
| Checkpoint | Best Epoch | Best Val Acc |
|---|---|---|
best_ensemble_model_1.pt | 52 | 72.7% |
best_ensemble_model_2.pt | 43 | 92.1% |
best_ensemble_model_3.pt | 42 | 91.9% |
best_ensemble_model_4.pt | 40 | 91.9% |
Model 1 converged to a lower accuracy than the others, likely due to the random seed placing it in a poor loss basin. The ensemble averages softmax probabilities across all four models, which smooths this effect.
Checkpoint Strategy
Best model checkpoints are saved whenever validation accuracy improves:resume_from to train_single_model():