Output Files Overview
PROTÉGÉ PD generates multiple output files during the primer design process. All files are written to the same directory as your input FASTA file (the mounted directory).Output files are automatically created in the mounted directory specified with
--mount source=/your/path/,target=/root/.Primary Output Files
1. protege_consensus.csv
Purpose: Main primer design results with all candidate primers and degeneracy calculations. Location: Same directory as input file Generated by:protege.py:326
File Structure
Primer position in the consensus sequence as a range.Format:
start-endExample: 1-21, 2-22, 3-23Position numbers correspond to nucleotide positions in the aligned consensus sequence.Number of degenerate primer variants for this position.Calculation: Product of all degeneracies in the primer sequenceExamples:
1= No degeneracies (all standard nucleotides)2= One 2-fold degenerate position (e.g., Y, R, W, S, K, M)4= One 4-fold degenerate position (e.g., N, D, H, V, B)8= Two 2-fold degenerate positions or combinations
0 = Position contains gaps, primer not viableForward primer sequence with IUPAC degenerate codes.IUPAC Codes Used:
A,T,G,C- Standard nucleotidesR= A or G (puRine)Y= C or T (pYrimidine)W= A or T (Weak)S= G or C (Strong)K= G or T (Keto)M= A or C (aMino)N= Any nucleotide-= Gap (primer not usable)
ATGCGRAAYWSKMNTGCATReverse complement of the forward primer.Generated by: Usage: Use this sequence for reverse PCR primer ordering
protege.py:310-311Example Content
The first column (unnamed) is the row index. The semicolon (
;) is used as the delimiter.2. sequences.csv
Purpose: Original sequence information with nucleotide and amino acid sequences. Location: Same directory as input file Generated by:protege.py:155
File Structure
Sequence identifier from FASTA header (everything after
>).Original nucleotide sequence from input file.
Length of nucleotide sequence in base pairs.
Note the spelling: “nuc_lenght” (not “length”) as in the source code.
Translated amino acid sequence.Generated by:
protege.py:138Length of amino acid sequence (nuc_lenght / 3).
Example Content
3. alSequences.csv
Purpose: Aligned amino acid sequences after MUSCLE alignment. Location: Same directory as input file Generated by:protege.py:205
File Structure
Sequence identifier (matches sequences.csv).
Aligned amino acid sequence with gaps (
-).All sequences in this column have the same length due to alignment.Length of aligned sequence (including gaps).This value is the same for all sequences in a run.
Example Content
Gaps in aligned sequences indicate insertions/deletions between sequences and affect primer design.
Intermediate Files
4. translated_seqs_pL.fas
Purpose: Amino acid sequences in FASTA format for MUSCLE alignment. Location: Same directory as input file Generated by:protege.py:157-161
Example Content
5. aligned_muscle_pl_*.fas
Purpose: MUSCLE-aligned amino acid sequences. Filename pattern:aligned_muscle_pl_translated_seqs_pL.fas
Location: Same directory as input file
Generated by: protege.py:179 (MUSCLE alignment)
Example Content
This file shows the protein-level alignment used for consensus calculations. Gaps (
---) represent insertions/deletions.Downloading Files from Web Interface
When running PROTÉGÉ with the web interface (Dash), you can download results directly from the browser.Access Output Files
Access web interface
Open http://127.0.0.1:8050 in your browser after starting PROTÉGÉ.
Access Files from Command Line
All output files are written to your mounted directory:File Locations
All output files are created in the directory you mounted to
/root/ in the Docker container.Understanding Results
Selecting Best Primers
Sort by degeneracy
Sort by degeneracy
- Degeneracies = 1 (no degeneracy)
- Degeneracies = 2-4 (low degeneracy, good)
- Degeneracies = 8-16 (moderate, acceptable)
- Degeneracies > 32 (high, may be problematic)
Exclude gap-containing primers
Exclude gap-containing primers
Filter by position
Filter by position
Primer Quality Metrics
Excellent Primers
- Degeneracies: 1
- No gaps
- From conserved regions
- Standard nucleotides only
Good Primers
- Degeneracies: 2-8
- No gaps
- Limited degenerate positions
- Mostly standard nucleotides
Acceptable Primers
- Degeneracies: 8-32
- No gaps
- Multiple degenerate positions
- May require optimization
Problematic Primers
- Degeneracies: >32 or 0
- Contains gaps
- Highly degenerate
- Difficult to synthesize
Post-Processing Analysis
Import into Excel/Spreadsheet
Python Analysis
R Analysis
Troubleshooting
No output files created
No output files created
Possible causes:
- PROTÉGÉ encountered an error during processing
- Insufficient disk space
- Permission issues in mounted directory
- Check terminal output for error messages
- Verify write permissions:
ls -la /your/path/ - Ensure adequate free space:
df -h
All primers have high degeneracy
All primers have high degeneracy
Possible causes:
- Input sequences too divergent
- Consensus threshold too low
- Wrong gene region selected
- Increase consensus threshold:
-c 95 - Use more closely related sequences
- Select more conserved genes
- Check sequence quality and alignment
Most primers contain gaps
Most primers contain gaps
Possible causes:
- Sequences have many insertions/deletions
- Gap consensus enabled with variable sequences
- Use
-gflag to exclude gap positions - Trim sequences to conserved regions
- Remove outlier sequences with many indels
CSV files won't open correctly
CSV files won't open correctly
Possible causes:
- Semicolon delimiter not recognized
- Regional settings expect different delimiter
- Convert to comma:
sed 's/;/,/g' file.csv > file_comma.csv - Use Excel import wizard and specify
;delimiter - Open in Python/R with
sep=';'parameter
See Also
- Command Reference - Adjust parameters for better results
- Input Files - Improve input quality
- Running PROTÉGÉ - Docker configuration