Skip to main content

What is TRIFID?

TRIFID is a Machine Learning-based model that aims to predict the functionality of every single isoform in the genome. This method has been designed to be accurate, interpretable, and reproducible. Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. While peptide evidence strongly supports a main protein isoform for most coding genes, reliable proteomics experiments have found little evidence of alternative spliced proteins.
TRIFID was developed to bridge the gap between transcriptomic data and functional protein biology by predicting which splice isoforms are likely to be biologically important.

Key Features

High Accuracy

Trained on large-scale proteomics data from mass spectrometry experiments, distinguishing functionally important isoforms with high confidence.

Cross-Species Conservation

Isoforms predicted as functionally important show measurable cross-species conservation and significantly fewer broken functional domains.

Interpretable Results

Uses SHAP (SHapley Additive exPlanations) values to provide transparent, feature-level interpretation of predictions.

Multiple Species Support

Available predictions for human, mouse, rat, zebrafish, chicken, chimpanzee, pig, cow, macaque, fruitfly, and worm genomes.

Scientific Background

TRIFID addresses a fundamental question in genomics: which alternative splice isoforms produce functional proteins?

The Problem

  • Alternative splicing generates diverse transcript variants
  • Most alternative exons appear to be evolving neutrally
  • Limited proteomics evidence for alternative protein isoforms
  • Need to distinguish functional isoforms from transcriptional noise

The Solution

TRIFID uses 45+ predictive features across five major categories:
  • GENCODE/Ensembl/RefSeq annotations
  • Transcript support level (TSL)
  • CCDS consensus coding sequences
  • Start/end region confirmation status
  • APPRIS conservation scores (Firestar, Matador3D, Corsair, SPADE, THUMP)
  • PhyloCSF evolutionary scores
  • ALT-Corsair cross-species conservation
  • Protein domain conservation
  • RNA-seq junction coverage (QSplice)
  • Splice junction support across 32 tissues
  • Gene-level expression normalization
  • Splice junction integrity
  • Number of coding exons
  • Alternative splicing event types
  • Pfam domain effects (QPfam)
  • Domain integrity and coverage
  • Functional residue preservation
  • Signal peptide predictions

How TRIFID Works

TRIFID employs a Random Forest classifier trained on high-confidence proteomics data:
1

Feature Extraction

Extract 45+ genomic, transcriptomic, and proteomic features for each splice isoform
2

Model Training

Train on isoforms detected in large-scale mass spectrometry experiments (Kim et al., 2014)
3

Prediction

Classify isoforms with a probability score (0-1) indicating functional importance
4

Score Normalization

Normalize scores within each gene to identify the most important isoform(s)

Prediction Interpretation

  • TRIFID Score (0-1): Raw probability of functional importance
  • Normalized Score: Gene-level normalized score highlighting the principal isoform
  • High scores (>0.7): Likely functionally important, under purifying selection
  • Low scores (below 0.3): Likely evolving neutrally, transcriptional noise

Use Cases

Variant Interpretation

Prioritize variants affecting functionally important splice isoforms in disease studies

Genome Annotation

Identify principal isoforms for genome annotation projects

Comparative Genomics

Study isoform evolution across species

Therapeutic Targeting

Select biologically relevant isoforms for drug development

Published Research

TRIFID is described in the peer-reviewed manuscript: Assessing the functional relevance of splice isoforms
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML.
NAR Genomics and Bioinformatics, Volume 3, Issue 2, June 2021, lqab044
https://doi.org/10.1093/nargab/lqab044
If you use TRIFID in your research, please cite this publication.

Data Availability

Pre-computed TRIFID predictions are available for multiple genome assemblies and species:
SpeciesAssemblyDatabaseVersionLink
HumanGRCh38GENCODE27, 37, 42Download
HumanGRCh38/37RefSeq105, 110Download
MouseGRCm38/39GENCODE25, 31Download
RatmRatBN7.2Ensembl105Download
ZebrafishGRCz11Ensembl104Download
See the full list of available predictions in the Installation section.

Next Steps

Installation

Install TRIFID and download pre-computed predictions

Quick Start

Load predictions and analyze your first gene

GitHub Repository

View source code and contribute

Build docs developers (and LLMs) love