Introduction to TRIFID

What is TRIFID?

TRIFID is a Machine Learning-based model that aims to predict the functionality of every single isoform in the genome. This method has been designed to be accurate, interpretable, and reproducible. Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. While peptide evidence strongly supports a main protein isoform for most coding genes, reliable proteomics experiments have found little evidence of alternative spliced proteins.

TRIFID was developed to bridge the gap between transcriptomic data and functional protein biology by predicting which splice isoforms are likely to be biologically important.

Key Features

High Accuracy

Trained on large-scale proteomics data from mass spectrometry experiments, distinguishing functionally important isoforms with high confidence.

Cross-Species Conservation

Isoforms predicted as functionally important show measurable cross-species conservation and significantly fewer broken functional domains.

Interpretable Results

Uses SHAP (SHapley Additive exPlanations) values to provide transparent, feature-level interpretation of predictions.

Multiple Species Support

Available predictions for human, mouse, rat, zebrafish, chicken, chimpanzee, pig, cow, macaque, fruitfly, and worm genomes.

Scientific Background

TRIFID addresses a fundamental question in genomics: which alternative splice isoforms produce functional proteins?

The Problem

Alternative splicing generates diverse transcript variants
Most alternative exons appear to be evolving neutrally
Limited proteomics evidence for alternative protein isoforms
Need to distinguish functional isoforms from transcriptional noise

The Solution

TRIFID uses 45+ predictive features across five major categories:

Annotation Features

GENCODE/Ensembl/RefSeq annotations
Transcript support level (TSL)
CCDS consensus coding sequences
Start/end region confirmation status

Evolution Features

APPRIS conservation scores (Firestar, Matador3D, Corsair, SPADE, THUMP)
PhyloCSF evolutionary scores
ALT-Corsair cross-species conservation
Protein domain conservation

Expression Features

RNA-seq junction coverage (QSplice)
Splice junction support across 32 tissues
Gene-level expression normalization

Splicing Features

Splice junction integrity
Number of coding exons
Alternative splicing event types

Structure Features

Pfam domain effects (QPfam)
Domain integrity and coverage
Functional residue preservation
Signal peptide predictions

How TRIFID Works

TRIFID employs a Random Forest classifier trained on high-confidence proteomics data:

Feature Extraction

Extract 45+ genomic, transcriptomic, and proteomic features for each splice isoform

Model Training

Train on isoforms detected in large-scale mass spectrometry experiments (Kim et al., 2014)

Prediction

Classify isoforms with a probability score (0-1) indicating functional importance

Score Normalization

Normalize scores within each gene to identify the most important isoform(s)

Prediction Interpretation

TRIFID Score (0-1): Raw probability of functional importance
Normalized Score: Gene-level normalized score highlighting the principal isoform
High scores (>0.7): Likely functionally important, under purifying selection
Low scores (below 0.3): Likely evolving neutrally, transcriptional noise

Use Cases

Variant Interpretation

Prioritize variants affecting functionally important splice isoforms in disease studies

Genome Annotation

Identify principal isoforms for genome annotation projects

Comparative Genomics

Study isoform evolution across species

Therapeutic Targeting

Select biologically relevant isoforms for drug development

Published Research

TRIFID is described in the peer-reviewed manuscript: Assessing the functional relevance of splice isoforms
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML.
NAR Genomics and Bioinformatics, Volume 3, Issue 2, June 2021, lqab044
https://doi.org/10.1093/nargab/lqab044

If you use TRIFID in your research, please cite this publication.

Data Availability

Pre-computed TRIFID predictions are available for multiple genome assemblies and species:

Species	Assembly	Database	Version	Link
Human	GRCh38	GENCODE	27, 37, 42	Download
Human	GRCh38/37	RefSeq	105, 110	Download
Mouse	GRCm38/39	GENCODE	25, 31	Download
Rat	mRatBN7.2	Ensembl	105	Download
Zebrafish	GRCz11	Ensembl	104	Download

See the full list of available predictions in the Installation section.

Next Steps

Installation

Install TRIFID and download pre-computed predictions

Quick Start

Load predictions and analyze your first gene

GitHub Repository

View source code and contribute

Get Started

Core Concepts

User Guides

TRIFID Modules

Data & Models

Introduction to TRIFID

What is TRIFID?

Key Features

High Accuracy

Cross-Species Conservation

Interpretable Results

Multiple Species Support

Scientific Background

The Problem

The Solution

How TRIFID Works

Prediction Interpretation

Use Cases

Variant Interpretation

Genome Annotation

Comparative Genomics

Therapeutic Targeting

Published Research

Data Availability

Next Steps

Installation

Quick Start

GitHub Repository

Build docs developers (and LLMs) love

Get Started

Core Concepts

User Guides

TRIFID Modules

Data & Models

​What is TRIFID?

​Key Features

High Accuracy

Cross-Species Conservation

Interpretable Results

Multiple Species Support

​Scientific Background

​The Problem

​The Solution

​How TRIFID Works

​Prediction Interpretation

​Use Cases

Variant Interpretation

Genome Annotation

Comparative Genomics

Therapeutic Targeting

​Published Research

​Data Availability

​Next Steps

Installation

Quick Start

GitHub Repository

Build docs developers (and LLMs) love

What is TRIFID?

Key Features

Scientific Background

The Problem

The Solution

How TRIFID Works

Prediction Interpretation

Use Cases

Published Research

Data Availability

Next Steps