Output Files

Output Files Overview

PROTÉGÉ PD generates multiple output files during the primer design process. All files are written to the same directory as your input FASTA file (the mounted directory).

Output files are automatically created in the mounted directory specified with --mount source=/your/path/,target=/root/.

Primary Output Files

1. protege_consensus.csv

Purpose: Main primer design results with all candidate primers and degeneracy calculations. Location: Same directory as input file Generated by: protege.py:326

phyloDF.to_csv('protege_consensus.csv',sep = ';', index = True)

File Structure

Column: position

string

Primer position in the consensus sequence as a range.Format: start-endExample: 1-21, 2-22, 3-23Position numbers correspond to nucleotide positions in the aligned consensus sequence.

Column: degeneracies

integer

Number of degenerate primer variants for this position.Calculation: Product of all degeneracies in the primer sequenceExamples:

1 = No degeneracies (all standard nucleotides)
2 = One 2-fold degenerate position (e.g., Y, R, W, S, K, M)
4 = One 4-fold degenerate position (e.g., N, D, H, V, B)
8 = Two 2-fold degenerate positions or combinations

Special case: 0 = Position contains gaps, primer not viable

Column: forwardPrimer

string

Forward primer sequence with IUPAC degenerate codes.IUPAC Codes Used:

A, T, G, C - Standard nucleotides
R = A or G (puRine)
Y = C or T (pYrimidine)
W = A or T (Weak)
S = G or C (Strong)
K = G or T (Keto)
M = A or C (aMino)
N = Any nucleotide
- = Gap (primer not usable)

Example: ATGCGRAAYWSKMNTGCAT

Column: reversePrimer

string

Reverse complement of the forward primer.Generated by: protege.py:310-311

frwd = Seq(frwdPrimer)
rvrsPrimer = frwd.reverse_complement()

Usage: Use this sequence for reverse PCR primer ordering

Example Content

;position;degeneracies;forwardPrimer;reversePrimer
0;1-21;1;ATGTCGGTTCGTGACGTGAAA;TTTCACGTCACGAACCGACAT
1;2-22;1;TGTCGGTTCGTGACGTGAAAC;GTTTCACGTCACGAACCGACA
2;3-23;2;GTCGGTTCGTGACGTGAAACR;YGTTTCACGTCACGAACCGAC
3;4-24;4;TCGGTTCGTGACGTGAAACGG;CCGTTTCACGTCACGAACCGA
4;5-25;0;CGGTTCGTGACGTGAAACGGT---;---ACCGTTTCACGTCACGAACCG

The first column (unnamed) is the row index. The semicolon (;) is used as the delimiter.

2. sequences.csv

Purpose: Original sequence information with nucleotide and amino acid sequences. Location: Same directory as input file Generated by: protege.py:155

sequences.to_csv('sequences.csv',sep = ';', index = True)

File Structure

Column: id

string

Sequence identifier from FASTA header (everything after >).

Column: nuc_seq

string

Original nucleotide sequence from input file.

Column: nuc_lenght

integer

Length of nucleotide sequence in base pairs.

Note the spelling: “nuc_lenght” (not “length”) as in the source code.

Column: amino_seq

string

Translated amino acid sequence.Generated by: protege.py:138

amino = nuc.translate()

Column: amino_lenght

integer

Length of amino acid sequence (nuc_lenght / 3).

Example Content

;id;nuc_seq;nuc_lenght;amino_seq;amino_lenght
0;strain_1_gyrB;ATGTCGGTTCGTGACGTGAAA...;180;MSVRDVKPVAEGIGA...;60
1;strain_2_gyrB;ATGTCGGTTCGTGACGTGAAA...;180;MSVRDVKPVAEGIGA...;60
2;strain_3_gyrB;ATGTCGGTTCGTGACGTGAAA...;180;MSVRDVKPVAEGIGA...;60

3. alSequences.csv

Purpose: Aligned amino acid sequences after MUSCLE alignment. Location: Same directory as input file Generated by: protege.py:205

alSequences.to_csv('alSequences.csv',sep = ';', index = True)

File Structure

Column: id

string

Sequence identifier (matches sequences.csv).

Column: al_amino_seq

string

Aligned amino acid sequence with gaps (-).All sequences in this column have the same length due to alignment.

Column: al_amino_lenght

integer

Length of aligned sequence (including gaps).This value is the same for all sequences in a run.

Example Content

;id;al_amino_seq;al_amino_lenght
0;strain_1_gyrB;MSVRDVKPVAEGIGA---LLAVA...;65
1;strain_2_gyrB;MSVRDVKPVAEGIGALLAVA...;65
2;strain_3_gyrB;MSVRDVKPVAEGIGARLLAVA...;65

Gaps in aligned sequences indicate insertions/deletions between sequences and affect primer design.

Intermediate Files

4. translated_seqs_pL.fas

Purpose: Amino acid sequences in FASTA format for MUSCLE alignment. Location: Same directory as input file Generated by: protege.py:157-161

f = open(translatedName, 'w')
for i in range(0,len(sequences)):
    f.write('>' + sequences.id[i] + '\n')
    f.write(sequences.amino_seq[i] + '\n')
f.close()

Example Content

>strain_1_gyrB
MSVRDVKPVAEGIGAGRAGVAGAKRGRAGAGVRARAR
>strain_2_gyrB
MSVRDVKPVAEGIGAGRAGVAGAKRGRAGAGVRARARK
>strain_3_gyrB
MSVRDVKPVAEGIGAGRAGVAGAKRGRAGAGVRARARQ

5. aligned_muscle_pl_*.fas

Purpose: MUSCLE-aligned amino acid sequences. Filename pattern: aligned_muscle_pl_translated_seqs_pL.fas Location: Same directory as input file Generated by: protege.py:179 (MUSCLE alignment)

alnProc = subprocess.run(["muscle_lin", "-in", in_file, "-out", out_file])

Example Content

>strain_1_gyrB
MSVRDVKPVAEGIGA---GRAGVAGAKRGRAGAGVRARARV
>strain_2_gyrB
MSVRDVKPVAEGIGAGRAGVAGAKRGRAGAGVRARARK
>strain_3_gyrB
MSVRDVKPVAEGIGARGRAGVAGAKRGRAGAGVRARARQ

This file shows the protein-level alignment used for consensus calculations. Gaps (---) represent insertions/deletions.

Downloading Files from Web Interface

When running PROTÉGÉ with the web interface (Dash), you can download results directly from the browser.

Access Output Files

Access web interface

Open http://127.0.0.1:8050 in your browser after starting PROTÉGÉ.

Wait for processing

PROTÉGÉ will display processing progress. Wait for completion message.

View results

Interactive visualization of primer candidates will appear.

Download files

Use the download buttons or links in the interface to save result files.

Access Files from Command Line

All output files are written to your mounted directory:

# List all output files
ls -lh /your/mounted/path/

# Expected files:
# - protege_consensus.csv (main results)
# - sequences.csv
# - alSequences.csv
# - translated_seqs_pL.fas
# - aligned_muscle_pl_*.fas

File Locations

All output files are created in the directory you mounted to /root/ in the Docker container.

Example:

# If you ran:
docker run --mount type=bind,source=/home/user/data/,target=/root/. ...

# Files are created in:
/home/user/data/protege_consensus.csv
/home/user/data/sequences.csv
/home/user/data/alSequences.csv
/home/user/data/translated_seqs_pL.fas
/home/user/data/aligned_muscle_pl_*.fas

Understanding Results

Selecting Best Primers

Sort by degeneracy

# Sort primers by degeneracy (lower is better)
sort -t';' -k3 -n protege_consensus.csv | head -20

Best primers:

Degeneracies = 1 (no degeneracy)
Degeneracies = 2-4 (low degeneracy, good)
Degeneracies = 8-16 (moderate, acceptable)
Degeneracies > 32 (high, may be problematic)

Exclude gap-containing primers

# Show only primers without gaps
grep -v '---' protege_consensus.csv | grep -v '^;position'

Primers containing gaps (degeneracies = 0) cannot be synthesized.

Filter by position

# Get primers from specific region (e.g., positions 100-200)
awk -F';' '$2 ~ /^1[0-9][0-9]-/' protege_consensus.csv

Select primers from conserved gene regions if known.

Primer Quality Metrics

Excellent Primers

Degeneracies: 1
No gaps
From conserved regions
Standard nucleotides only

Good Primers

Degeneracies: 2-8
No gaps
Limited degenerate positions
Mostly standard nucleotides

Acceptable Primers

Degeneracies: 8-32
No gaps
Multiple degenerate positions
May require optimization

Problematic Primers

Degeneracies: >32 or 0
Contains gaps
Highly degenerate
Difficult to synthesize

Post-Processing Analysis

Import into Excel/Spreadsheet

# Convert semicolon-delimited to comma-delimited
sed 's/;/,/g' protege_consensus.csv > protege_consensus_comma.csv

# Open in Excel, LibreOffice, or Google Sheets

Python Analysis

import pandas as pd

# Load results
df = pd.read_csv('protege_consensus.csv', sep=';', index_col=0)

# Filter primers with degeneracy ≤ 4
good_primers = df[df['degeneracies'] <= 4]
good_primers = good_primers[good_primers['degeneracies'] > 0]

print(f"Found {len(good_primers)} high-quality primers")
print(good_primers[['position', 'degeneracies', 'forwardPrimer']].head(10))

# Export filtered results
good_primers.to_csv('selected_primers.csv', sep=';')

R Analysis

# Load results
df <- read.csv('protege_consensus.csv', sep=';')

# Filter and sort
good_primers <- df[df$degeneracies > 0 & df$degeneracies <= 8, ]
good_primers <- good_primers[order(good_primers$degeneracies), ]

# View top candidates
head(good_primers, 20)

# Export
write.csv(good_primers, 'selected_primers.csv', row.names=FALSE)

Troubleshooting

No output files created

Possible causes:

PROTÉGÉ encountered an error during processing
Insufficient disk space
Permission issues in mounted directory

Solutions:

Check terminal output for error messages
Verify write permissions: ls -la /your/path/
Ensure adequate free space: df -h

All primers have high degeneracy

Possible causes:

Input sequences too divergent
Consensus threshold too low
Wrong gene region selected

Solutions:

Increase consensus threshold: -c 95
Use more closely related sequences
Select more conserved genes
Check sequence quality and alignment

Most primers contain gaps

Possible causes:

Sequences have many insertions/deletions
Gap consensus enabled with variable sequences

Solutions:

Use -g flag to exclude gap positions
Trim sequences to conserved regions
Remove outlier sequences with many indels

CSV files won't open correctly

Possible causes:

Semicolon delimiter not recognized
Regional settings expect different delimiter

Solutions:

Convert to comma: sed 's/;/,/g' file.csv > file_comma.csv
Use Excel import wizard and specify ; delimiter
Open in Python/R with sep=';' parameter

Getting Started

Usage Guide

Core Concepts

Web Interface

Advanced

Output Files Overview

Primary Output Files

1. protege_consensus.csv

File Structure

Example Content

2. sequences.csv

File Structure

Example Content

3. alSequences.csv

File Structure

Example Content

Intermediate Files

4. translated_seqs_pL.fas

Example Content

5. aligned_muscle_pl_*.fas

Example Content

Downloading Files from Web Interface

Access Output Files

Access Files from Command Line

File Locations

Understanding Results

Selecting Best Primers

Primer Quality Metrics

Excellent Primers

Good Primers

Acceptable Primers

Problematic Primers

Post-Processing Analysis

Import into Excel/Spreadsheet

Python Analysis

R Analysis

Troubleshooting

See Also

Build docs developers (and LLMs) love

Getting Started

Usage Guide

Core Concepts

Web Interface

Advanced

​Output Files Overview

​Primary Output Files

​1. protege_consensus.csv

​File Structure

​Example Content

​2. sequences.csv

​File Structure

​Example Content

​3. alSequences.csv

​File Structure

​Example Content

​Intermediate Files

​4. translated_seqs_pL.fas

​Example Content

​5. aligned_muscle_pl_*.fas

​Example Content

​Downloading Files from Web Interface

​Access Output Files

​Access Files from Command Line

​File Locations

​Understanding Results

​Selecting Best Primers

​Primer Quality Metrics

Excellent Primers

Good Primers

Acceptable Primers

Problematic Primers

​Post-Processing Analysis

​Import into Excel/Spreadsheet

​Python Analysis

​R Analysis

​Troubleshooting

​See Also

Build docs developers (and LLMs) love

Output Files Overview

Primary Output Files

1. protege_consensus.csv

File Structure

Example Content

2. sequences.csv

File Structure

Example Content

3. alSequences.csv

File Structure

Example Content

Intermediate Files

4. translated_seqs_pL.fas

Example Content

5. aligned_muscle_pl_*.fas

Example Content

Downloading Files from Web Interface

Access Output Files

Access Files from Command Line

File Locations

Understanding Results

Selecting Best Primers

Primer Quality Metrics

Post-Processing Analysis

Import into Excel/Spreadsheet

Python Analysis

R Analysis

Troubleshooting

See Also