C1orf112: Chromosome 1 Open Reading Frame 112

ENSG00000000460 (Ensembl) - Q9NSG2 (CA112_HUMAN) (UniProt)

Overview

This case study demonstrates the QSplice module of TRIFID, which quantifies splice junction coverage from RNA-seq data. C1orf112 serves as an excellent example of how RNA-seq evidence contributes to isoform functionality predictions.

What is QSplice?

QSplice is a TRIFID module that:

Quantifies splice junction coverage from STAR RNA-seq alignments
Maps unique reads to genome positions using collapsed coding splice junctions
Calculates coverage scores per transcript
Integrates with TRIFID’s machine learning model as predictive features

QSplice Methodology

Input Data

Genome annotation: GENCODE GFF3 file
RNA-seq samples: STAR SJ.out.tab files from E-MTAB-2836
- 32 human tissues
- 122 individuals
- Comprehensive tissue expression atlas

Running QSplice

python -m trifid.preprocessing.qsplice \
    --gff data/external/genome_annotation/GRCh38/g27/gencode.v27.annotation.gff3.gz \
    --outdir data/external/qsplice/GRCh38/g27 \
    --samples out/E-MTAB-2836/GRCh38/STAR/g27 \
    --version g

Output Files

sj_maxp.emtab2836.mapped.tsv.gz: Splice junction-level scores
qsplice.emtab2836.g27.tsv.gz: Transcript-level scores (TRIFID input)

C1orf112 Splice Junction Analysis

Splice Junction Scores for ENST00000472795

Chromosome	Type	Start	End	Strand	Gene ID	Gene Name	Transcript ID	CDS Coverage	Intron #	Unique Reads	Tissue	Gene Mean	Gene Mean CDS	RNA2sj	RNA2sj_cds
chr1	intron	169794906	169798856	+	ENSG00000000460	C1orf112	ENST00000472795	none	1	2	tonsil	67.37	73.78	0.0297	0.0271
chr1	intron	169798959	169800882	+	ENSG00000000460	C1orf112	ENST00000472795	none	2	69	testis	67.37	73.78	1.024	0.9352
chr1	intron	169800972	169802620	+	ENSG00000000460	C1orf112	ENST00000472795	full	3	74	testis	67.37	73.78	1.098	1.0029
chr1	intron	169802726	169803168	+	ENSG00000000460	C1orf112	ENST00000472795	full	4	77	testis	67.37	73.78	1.143	1.0436
chr1	intron	169803310	169804074	+	ENSG00000000460	C1orf112	ENST00000472795	full	5	57	testis	67.37	73.78	0.846	0.7725

Key Observations

Maximum coverage selection: QSplice selects the junction with maximum coverage (intron 5) across all tissues
Unique reads: Junction 5 has 57 unique reads in testis (the tissue with highest expression)
Minimum bottleneck: This junction represents the lowest coverage among coding splice junctions for this isoform
Normalized score: RNA2sj = 57 / 67.37 = 0.846

QSplice identifies the “weakest link” in the splice junction chain, providing a conservative estimate of transcript expression.

Transcript-Level QSplice Scores

C1orf112 Isoform Comparison

Chromosome	Gene ID	Gene Name	Transcript ID	Intron #	Exons	CDS Exons	Unique Reads	Tissue	RNA2sj	RNA2sj_cds
chr1	ENSG00000000460	C1orf112	ENST00000286031	6	24	22	53	testis	0.787	0.718
chr1	ENSG00000000460	C1orf112	ENST00000359326	7	25	22	53	testis	0.787	0.718
chr1	ENSG00000000460	C1orf112	ENST00000413811	20	23	14	62	testis	0.920	0.840
chr1	ENSG00000000460	C1orf112	ENST00000459772	2	23	3	7	fallopian tube	0.104	0.095
chr1	ENSG00000000460	C1orf112	ENST00000466580	2	8	3	7	fallopian tube	0.104	0.095
chr1	ENSG00000000460	C1orf112	ENST00000472795	5	6	4	57	testis	0.846	0.773
chr1	ENSG00000000460	C1orf112	ENST00000481744	2	7	3	7	fallopian tube	0.104	0.095
chr1	ENSG00000000460	C1orf112	ENST00000496973	5	6	6	8	tonsil	0.119	0.108
chr1	ENSG00000000460	C1orf112	ENST00000498289	3	29	0	0	-	0	0

Interpretation

ENST00000472795: Moderate RNA2sj score (0.846) indicates good expression support
Tissue specificity: Highest expression in testis
Low-scoring isoforms: Some isoforms show minimal expression (< 0.1), suggesting limited functional relevance
Non-coding isoform: ENST00000498289 has 0 CDS exons and no expression

Visual Representation

ENST00000472795 Exon Structure

Exon 1  |Intron 1| Exon 2  |Intron 2| Exon 3  |Intron 3| Exon 4  |Intron 4| Exon 5  |Intron 5| Exon 6
   |        2        |        69       |        74       |        77       |   57*  |
  5'UTR           CDS Start                                              Bottleneck    CDS End

* Junction 5 (57 reads) is the bottleneck - the minimum coverage determines the transcript-level score.

QSplice Features in TRIFID

QSplice generates two main features used in TRIFID:

RNA2sj: Unique reads divided by gene average (all splice junctions)
RNA2sj_cds: Unique reads divided by gene average (only CDS-spanning junctions)

These features contribute to TRIFID predictions by providing:

Expression evidence for isoform existence
Tissue-specific functional context
Quantitative support beyond annotation

Running QSplice on Your Data

Using STAR SJ.out.tab Files

# With pre-computed STAR alignments
python -m trifid.preprocessing.qsplice \
    --gff your_annotation.gff3.gz \
    --outdir output/qsplice \
    --samples path/to/star/output \
    --version g

Using Custom Splice Junction File

# With custom SJ.out.tab
python -m trifid.preprocessing.qsplice \
    --gff your_annotation.gff3.gz \
    --outdir output/qsplice \
    --custom custom_SJ.out.tab \
    --version g

Integration with TRIFID Predictions

QSplice scores are integrated into the full TRIFID feature set alongside:

APPRIS structural annotations
PhyloCSF conservation scores
Pfam domain effects
Transcript Support Levels (TSL)
GENCODE basic annotation

The Random Forest model learns the importance of RNA-seq evidence relative to other features, automatically weighting expression support appropriately.

Pre-computed QSplice Data

Pre-computed QSplice scores are available for:

GENCODE 27 (Human, GRCh38)
GENCODE 42 (Human, GRCh38)
GENCODE 25 (Mouse, GRCm38)
Various other genome versions

See the Data Availability page for download links.

Code Example: Analyzing QSplice Results

import pandas as pd
import matplotlib.pyplot as plt

# Load QSplice results
qsplice = pd.read_csv(
    'data/external/qsplice/GRCh38/g27/qsplice.emtab2836.g27.tsv.gz',
    sep='\t',
    compression='gzip'
)

# Filter for gene of interest
gene_data = qsplice[qsplice['gene_name'] == 'C1orf112']

# Plot RNA2sj scores
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(gene_data['transcript_id'], gene_data['RNA2sj'])
ax.set_xlabel('RNA2sj Score')
ax.set_ylabel('Transcript ID')
ax.set_title('C1orf112 Splice Junction Coverage Scores')
ax.axvline(x=0.5, color='r', linestyle='--', label='Functional threshold')
ax.legend()
plt.tight_layout()
plt.show()

Case Studies

Tutorials

C1orf112 Case Study

C1orf112: Chromosome 1 Open Reading Frame 112

Overview

What is QSplice?

QSplice Methodology

Input Data

Running QSplice

Output Files

C1orf112 Splice Junction Analysis

Splice Junction Scores for ENST00000472795

Key Observations

Transcript-Level QSplice Scores

C1orf112 Isoform Comparison

Interpretation

Visual Representation

ENST00000472795 Exon Structure

QSplice Features in TRIFID

Running QSplice on Your Data

Using STAR SJ.out.tab Files

Using Custom Splice Junction File

Integration with TRIFID Predictions

Pre-computed QSplice Data

Code Example: Analyzing QSplice Results

References

Next Steps

Build docs developers (and LLMs) love

Case Studies

Tutorials

​C1orf112: Chromosome 1 Open Reading Frame 112

​Overview

​What is QSplice?

​QSplice Methodology

​Input Data

​Running QSplice

​Output Files

​C1orf112 Splice Junction Analysis

​Splice Junction Scores for ENST00000472795

​Key Observations

​Transcript-Level QSplice Scores

​C1orf112 Isoform Comparison

​Interpretation

​Visual Representation

​ENST00000472795 Exon Structure

​QSplice Features in TRIFID

​Running QSplice on Your Data

​Using STAR SJ.out.tab Files

​Using Custom Splice Junction File

​Integration with TRIFID Predictions

​Pre-computed QSplice Data

​Code Example: Analyzing QSplice Results

​References

​Next Steps

Build docs developers (and LLMs) love

C1orf112: Chromosome 1 Open Reading Frame 112

Overview

What is QSplice?

QSplice Methodology

Input Data

Running QSplice

Output Files

C1orf112 Splice Junction Analysis

Splice Junction Scores for ENST00000472795

Key Observations

Transcript-Level QSplice Scores

C1orf112 Isoform Comparison

Interpretation

Visual Representation

ENST00000472795 Exon Structure

QSplice Features in TRIFID

Running QSplice on Your Data

Using STAR SJ.out.tab Files

Using Custom Splice Junction File

Integration with TRIFID Predictions

Pre-computed QSplice Data

Code Example: Analyzing QSplice Results

References

Next Steps