Overview
TRIFID provides pre-trained machine learning models that can predict the functional relevance of splice isoforms. These models have been trained on large-scale proteomics data and validated across multiple species.Available Models
Human Model (v1)
Training Data: GENCODE Release 27 (GRCh38.p10) Features: 45 predictive features Release Date: March 10, 2021 Download: TRIFID v1 Model (pickle format) Training Set: GENCODE 27 training data Species Applicability: Homo sapiens (human)Enhanced Models (v2)
Training Data: Extended proteomics evidence with additional features Features: 47 predictive features (2 additional features) Release Date: September 2022 Species Applicability:- Homo sapiens (Human-specific model)
- Mus musculus (Mouse-specific model)
- Other vertebrates (Vertebrates model)
- Invertebrates (Invertebrates model)
The v2 models include two additional features that improve prediction accuracy, particularly for minor isoforms.
Model Architecture
TRIFID uses a gradient boosting machine learning approach that combines:- Accuracy: High-confidence predictions validated against proteomics data
- Interpretability: SHAP values for feature importance and local predictions
- Reproducibility: Complete training pipeline and configuration files available
Model Input Features
The models use 47 predictive features across multiple categories:- Structural features: Protein domain integrity (Pfam), sequence length
- Conservation features: PhyloCSF scores, ALT-Corsair evolutionary age
- Expression features: Splice junction coverage (QSplice), tissue specificity
- Annotation features: APPRIS principal isoform scores, CDS completeness
Model Output
TRIFID Score: Probability (0-1) representing functional relevance- 0.0-0.3: Low functional probability (likely neutral evolution)
- 0.3-0.7: Uncertain functional relevance
- 0.7-1.0: High functional probability (likely under purifying selection)
Using the Model
Loading a Pre-trained Model
Example: Querying FGFR1 Isoforms
| Transcript ID | Gene Name | APPRIS Label | Length | TRIFID Score |
|---|---|---|---|---|
| ENST00000447712 | FGFR1 | PRINCIPAL:3 | 822 | 0.87 |
| ENST00000356207 | FGFR1 | MINOR | 733 | 0.60 |
| ENST00000397103 | FGFR1 | MINOR | 733 | 0.01 |
| ENST00000619564 | FGFR1 | MINOR | 228 | 0.00 |
Interpreting Predictions with SHAP
TRIFID includes SHAP (SHapley Additive exPlanations) values for model interpretability:Model Training Pipeline
If you want to train TRIFID on custom data or reproduce the training process:1. Prepare the Dataset
2. Train the Model
- Loads the training set (proteomics-validated isoforms)
- Performs hyperparameter optimization
- Trains the gradient boosting model
- Saves the model in pickle format
3. Generate Predictions
Model Configuration
The training pipeline is controlled by configuration files in theconfig/ directory:
- config.yaml: File paths and pipeline parameters
- features.yaml: Feature definitions, categories, and species support
Model Performance
The TRIFID model has been validated to show:- High concordance with proteomics detection
- Predicted functional isoforms show measurable cross-species conservation
- Exons from high-scoring isoforms are under purifying selection
- Low-scoring isoforms show evidence of neutral evolution
Tutorials and Examples
Comprehensive tutorials are available as Jupyter notebooks:- Tutorial Notebook: End-to-end TRIFID workflow
- Figures Notebook: Reproduce publication figures
Model Applicability
While TRIFID was developed for the human genome, the models can be applied to:Human (Homo sapiens)
Human (Homo sapiens)
Use the Human-specific model (v1 or v2) for GRCh37 or GRCh38 assemblies with GENCODE or RefSeq annotations.
Mouse (Mus musculus)
Mouse (Mus musculus)
Use the Mouse-specific model (v2) for GRCm38 or GRCm39 assemblies with GENCODE annotations.
Other Vertebrates
Other Vertebrates
Use the Vertebrates model (v2) for rat, zebrafish, chicken, chimpanzee, pig, cow, and macaque.
Invertebrates
Invertebrates
Use the Invertebrates model (v2) for fruitfly and worm. Note that prediction accuracy may be lower due to evolutionary distance.