What is TRIFID?
TRIFID is a Machine Learning-based model that aims to predict the functionality of every single isoform in the genome. This method has been designed to be accurate, interpretable, and reproducible. Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. While peptide evidence strongly supports a main protein isoform for most coding genes, reliable proteomics experiments have found little evidence of alternative spliced proteins.TRIFID was developed to bridge the gap between transcriptomic data and functional protein biology by predicting which splice isoforms are likely to be biologically important.
Key Features
High Accuracy
Trained on large-scale proteomics data from mass spectrometry experiments, distinguishing functionally important isoforms with high confidence.
Cross-Species Conservation
Isoforms predicted as functionally important show measurable cross-species conservation and significantly fewer broken functional domains.
Interpretable Results
Uses SHAP (SHapley Additive exPlanations) values to provide transparent, feature-level interpretation of predictions.
Multiple Species Support
Available predictions for human, mouse, rat, zebrafish, chicken, chimpanzee, pig, cow, macaque, fruitfly, and worm genomes.
Scientific Background
TRIFID addresses a fundamental question in genomics: which alternative splice isoforms produce functional proteins?The Problem
- Alternative splicing generates diverse transcript variants
- Most alternative exons appear to be evolving neutrally
- Limited proteomics evidence for alternative protein isoforms
- Need to distinguish functional isoforms from transcriptional noise
The Solution
TRIFID uses 45+ predictive features across five major categories:Annotation Features
Annotation Features
- GENCODE/Ensembl/RefSeq annotations
- Transcript support level (TSL)
- CCDS consensus coding sequences
- Start/end region confirmation status
Evolution Features
Evolution Features
- APPRIS conservation scores (Firestar, Matador3D, Corsair, SPADE, THUMP)
- PhyloCSF evolutionary scores
- ALT-Corsair cross-species conservation
- Protein domain conservation
Expression Features
Expression Features
- RNA-seq junction coverage (QSplice)
- Splice junction support across 32 tissues
- Gene-level expression normalization
Splicing Features
Splicing Features
- Splice junction integrity
- Number of coding exons
- Alternative splicing event types
Structure Features
Structure Features
- Pfam domain effects (QPfam)
- Domain integrity and coverage
- Functional residue preservation
- Signal peptide predictions
How TRIFID Works
TRIFID employs a Random Forest classifier trained on high-confidence proteomics data:Feature Extraction
Extract 45+ genomic, transcriptomic, and proteomic features for each splice isoform
Model Training
Train on isoforms detected in large-scale mass spectrometry experiments (Kim et al., 2014)
Prediction Interpretation
- TRIFID Score (0-1): Raw probability of functional importance
- Normalized Score: Gene-level normalized score highlighting the principal isoform
- High scores (>0.7): Likely functionally important, under purifying selection
- Low scores (below 0.3): Likely evolving neutrally, transcriptional noise
Use Cases
Variant Interpretation
Prioritize variants affecting functionally important splice isoforms in disease studies
Genome Annotation
Identify principal isoforms for genome annotation projects
Comparative Genomics
Study isoform evolution across species
Therapeutic Targeting
Select biologically relevant isoforms for drug development
Published Research
TRIFID is described in the peer-reviewed manuscript: Assessing the functional relevance of splice isoformsPozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML.
NAR Genomics and Bioinformatics, Volume 3, Issue 2, June 2021, lqab044
https://doi.org/10.1093/nargab/lqab044
If you use TRIFID in your research, please cite this publication.
Data Availability
Pre-computed TRIFID predictions are available for multiple genome assemblies and species:Next Steps
Installation
Install TRIFID and download pre-computed predictions
Quick Start
Load predictions and analyze your first gene
GitHub Repository
View source code and contribute