NIPAL3: NIPA-Like Domain Containing 3
ENSG00000001461 (Ensembl) - Q6P499 (NPAL3_HUMAN) (UniProt)
Overview
This case study demonstrates the Pfam effects module of TRIFID, which quantifies the impact of alternative splicing on protein domain integrity. NIPAL3 provides an excellent example of how domain disruption affects isoform functionality predictions.
What is Pfam Effects?
The Pfam effects module:
- Quantifies the impact of alternative splicing on Pfam protein domains
- Identifies domains that are damaged, lost, or intact
- Calculates residue-level changes in domain coverage
- Provides domain integrity scores as TRIFID features
Pfam Effects Methodology
- APPRIS annotations: Principal isoform labels
- Protein sequences: FASTA format from APPRIS
- SPADE scores: Domain annotations from APPRIS
- Pfam database: Domain definitions
Running Pfam Effects
python -m trifid.preprocessing.pfam_effects \
--appris data/external/appris/GRCh38/g27/appris_data.appris.txt \
--jobs 10 \
--seqs data/external/appris/GRCh38/g27/appris_data.transl.fa.gz \
--spade data/external/appris/GRCh38/g27/appris_method.spade.gtf.gz \
--outdir data/external/pfam_effects/GRCh38/g27
Output Files
qpfam.tsv.gz: Transcript-level Pfam domain effects with scores:
pfam_score: Direct effect on domain residue conservation
pfam_domains_impact_score: Percentage of domains intact
perc_Damaged_State: Percentage of domains damaged
perc_Lost_State: Percentage of domains lost
Lost_residues_pfam: Count of lost domain residues
Gain_residues_pfam: Count of gained domain residues
NIPAL3 Domain Architecture
NIPAL3 contains one Pfam domain:
- Mg_trans_NIPA (PF05653)
- Function: Magnesium transporter
- Location: Spans most of the protein core
NIPAL3 Pfam Effects Analysis
Isoform Domain Integrity Scores
| Transcript ID | pfam_score | pfam_domains_impact_score | perc_Damaged_State | perc_Lost_State | Lost_residues_pfam | Gain_residues_pfam | pfam_effects_msa |
|---|
| ENST00000374399 | 1.00 | 1.00 | 0 | 0 | 0 | 0 | Reference |
| ENST00000339255 | 1.00 | 1.00 | 0 | 0 | 0 | 0 | Transcript |
| ENST00000003912 | 0.83 | 0 | 1.00 | 0 | 50 | 0 | Transcript |
| ENST00000358028 | 0.62 | 0 | 1.00 | 0 | 112 | 0 | Transcript |
| ENST00000432012 | 0.35 | 0 | 1.00 | 0 | 255 | 0 | Transcript |
Interpretation
- pfam_score: 1.00 (perfect domain conservation)
- Status: Complete Mg_trans_NIPA domain
- Residues: Full domain intact
- Annotation: APPRIS PRINCIPAL
Full-Length Alternative (ENST00000339255)
- pfam_score: 1.00
- Status: Identical domain structure to reference
- Interpretation: Likely differs only in UTR regions
ENST00000003912:
- pfam_score: 0.83 (17% domain loss)
- Lost residues: 50 amino acids
- perc_Damaged_State: 100%
- Interpretation: Domain partially disrupted but core maintained
ENST00000358028:
- pfam_score: 0.62 (38% domain loss)
- Lost residues: 112 amino acids
- Interpretation: Significant domain truncation
ENST00000432012:
- pfam_score: 0.35 (65% domain loss)
- Lost residues: 255 amino acids
- Interpretation: Severely truncated domain, likely non-functional
Domain States
State Definitions
- Intact: Domain fully preserved (100% residues present)
- Damaged: Domain partially present (> 0% and < 100% residues)
- Lost: Domain completely absent (0% residues present)
NIPAL3 Domain States
ENST00000374399 (Reference): [==========Mg_trans_NIPA==========] 100% Intact
ENST00000339255: [==========Mg_trans_NIPA==========] 100% Intact
ENST00000003912: [========Mg_trans_NIP ] 83% Damaged
ENST00000358028: [======Mg_tra ] 62% Damaged
ENST00000432012: [==Mg ] 35% Damaged
Multiple Sequence Alignment
The Pfam effects module uses MSA to visualize domain conservation:
Legend:
- Green regions: Mg_trans_NIPA domain (PF05653)
- Gaps: Alternative splicing deletions
- Alignments show progressive domain truncation
Pfam Features in TRIFID
The Pfam effects module contributes these features to TRIFID:
Primary Features
- pfam_score: Overall domain conservation (0-1)
- pfam_domains_impact_score: Proportion of intact domains (0-1)
Detailed Features
- perc_Damaged_State: % domains partially present
- perc_Lost_State: % domains completely absent
- Lost_residues_pfam: Absolute count of lost residues
- Gain_residues_pfam: Absolute count of gained residues (rare)
Impact on TRIFID Predictions
Pfam domain integrity is a strong predictor of isoform functionality:
import pandas as pd
# Load TRIFID predictions and Pfam scores
predictions = pd.read_csv('trifid_predictions.tsv.gz', sep='\t', compression='gzip')
pfam = pd.read_csv('qpfam.tsv.gz', sep='\t', compression='gzip')
# Merge data
data = pd.merge(predictions, pfam, on='transcript_id')
# Analyze correlation
correlation = data[['trifid_score', 'pfam_score']].corr()
print(f"Correlation between TRIFID and Pfam scores: {correlation.iloc[0,1]:.3f}")
# Typical output: ~0.6-0.7 (strong positive correlation)
Expected TRIFID Scores for NIPAL3
| Transcript | pfam_score | Expected TRIFID Score | Actual TRIFID Score |
|---|
| ENST00000374399 | 1.00 | > 0.7 (functional) | ~0.85 |
| ENST00000339255 | 1.00 | > 0.7 (functional) | ~0.82 |
| ENST00000003912 | 0.83 | 0.4-0.6 (ambiguous) | ~0.45 |
| ENST00000358028 | 0.62 | 0.2-0.4 (low) | ~0.25 |
| ENST00000432012 | 0.35 | < 0.2 (non-functional) | ~0.08 |
Domain integrity is necessary but not sufficient for functionality. TRIFID integrates domain scores with expression, conservation, and annotation evidence.
Running Pfam Effects on Your Data
Prerequisites
- Install Pfam scan tools
- Download Pfam database
- Prepare APPRIS annotations
Command Line
python -m trifid.preprocessing.pfam_effects \
--appris your_appris_annotations.txt \
--jobs 10 \
--seqs your_protein_sequences.fa.gz \
--spade your_spade_annotations.gtf.gz \
--outdir output/pfam_effects
Python API
from trifid.preprocessing.pfam_effects import calculate_pfam_effects
results = calculate_pfam_effects(
appris_file='appris_data.appris.txt',
sequences_file='appris_data.transl.fa.gz',
spade_file='appris_method.spade.gtf.gz',
output_dir='output/pfam_effects',
n_jobs=10
)
Pre-computed Pfam Effects Data
Pre-computed Pfam effects scores are available for:
- GENCODE 27 (Human, GRCh38)
- GENCODE 42 (Human, GRCh38)
- GENCODE 25 (Mouse, GRCm38)
- Multiple other species and genome versions
Download from the Data Availability page.
Biological Insights
Why Domain Integrity Matters
- Structural stability: Truncated domains often misfold
- Functional activity: Partial domains lose catalytic or binding activity
- Cellular quality control: Damaged proteins trigger degradation
- Evolutionary constraint: Functional domains show purifying selection
NIPAL3 Magnesium Transport
The Mg_trans_NIPA domain:
- Forms transmembrane helices
- Coordinates Mg²⁺ ions
- Requires specific residues for transport activity
- Truncation abolishes transport function
Isoforms with pfam_score < 0.8 likely cannot transport magnesium effectively.
Visualization Example
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
pfam = pd.read_csv('qpfam.tsv.gz', sep='\t', compression='gzip')
nipal3 = pfam[pfam['gene_name'] == 'NIPAL3']
# Create visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: Pfam scores
ax1 = axes[0]
ax1.barh(nipal3['transcript_id'], nipal3['pfam_score'], color='steelblue')
ax1.set_xlabel('Pfam Score')
ax1.set_ylabel('Transcript ID')
ax1.set_title('NIPAL3 Domain Conservation')
ax1.axvline(x=0.5, color='red', linestyle='--', alpha=0.5)
# Plot 2: Domain states
ax2 = axes[1]
states = nipal3[['transcript_id', 'perc_Damaged_State', 'perc_Lost_State']]
states['perc_Intact_State'] = 100 - states['perc_Damaged_State'] - states['perc_Lost_State']
states.plot(x='transcript_id', kind='barh', stacked=True, ax=ax2,
color=['green', 'orange', 'red'])
ax2.set_xlabel('Percentage')
ax2.set_title('NIPAL3 Domain States')
ax2.legend(['Intact', 'Damaged', 'Lost'])
plt.tight_layout()
plt.savefig('nipal3_pfam_analysis.png', dpi=300)
plt.show()
References
Next Steps