Synopsis
Description
PROTÉGÉ - PROTEin coding GEne for phylogenetic tag and identification. Useful design and visualization of primers. Based on the Phylotag approach (Caro-Quintero, 2015), PROTÉGÉ designs degenerate primers from protein-coding gene sequences for phylogenetic marker development.Required Parameters
Path to the FASTA file containing protein-coding gene sequences.Details:
- Must be nucleotide sequences (DNA)
- Sequences must be in-frame (coding sequences)
- File must be in FASTA format
- Path is relative to the container’s working directory (
/root/)
Optional Parameters
Consensus percentage threshold for primer design.Details:
- Range: 0-100
- Higher values = more stringent consensus (fewer degenerate bases)
- Lower values = more permissive (more degenerate bases)
- Determines when a position requires a degenerate nucleotide
- If a single nucleotide appears ≥ consensus%, it’s used as-is
- If no single nucleotide reaches threshold, degenerate codes are used
- Accumulates nucleotides until combined percentage exceeds threshold
Flag to exclude positions with gaps from consensus calculation.Details:
- Default behavior (flag NOT used): Gaps are considered in consensus
- With flag (-g): Positions with any gaps are automatically assigned gap character
- Useful when alignment has insertions/deletions that should be avoided
Primer length specified in number of codons (multiply by 3 for nucleotide length).Details:
- Specifies primer length in codon units
- Actual nucleotide length = codon value × 3
- Default: 7 codons = 21 nucleotides
- Typical range: 5-12 codons (15-36 nucleotides)
- Shorter (5-7 codons): Less specific, more degenerate options
- Medium (7-9 codons): Balanced specificity and options
- Longer (10-12 codons): More specific, fewer options
Enable verbose output for detailed processing information.Details:
- Prints detailed alignment position analysis
- Shows nucleotide frequencies at each position
- Displays consensus decision-making process
- Useful for troubleshooting and understanding results
Verbose mode generates significant terminal output. Redirect to a file for easier analysis:
Display help message and exit.Example:
Complete Examples
Example 1: Basic Run with Defaults
- Consensus: 90%
- Gap consensus: enabled
- Primer length: 7 codons (21 nt)
- Verbose: disabled
Example 2: High-Stringency Primers
- Consensus: 95% (very stringent)
- Primer length: 8 codons (24 nt)
- Best for highly conserved genes
Example 3: Permissive with Gap Exclusion
- Consensus: 80% (permissive)
- Excludes gap positions
- Verbose output enabled
- Best for variable genes with indels
Example 4: Long Primers for Specificity
- Primer length: 12 codons (36 nt)
- Consensus: 92%
- Best for highly specific amplification
Example 5: Full Diagnostic Run
- Consensus: 85% (balanced)
- Primer length: 9 codons (27 nt)
- Verbose diagnostics enabled
- Best for development and troubleshooting
Parameter Interactions
Consensus Percentage and Degeneracy
Consensus Percentage and Degeneracy
Higher consensus (-c 95):
- Fewer degenerate primers
- More positions may fail to meet threshold
- May require more sequences to achieve consensus
- More degenerate primers (more IUPAC codes)
- More positions meet threshold
- Works with more variable sequences
Primer Length and Specificity
Primer Length and Specificity
Shorter primers (-d 5):
- More primer candidates
- Lower specificity
- Higher degeneracy tolerance
- Fewer primer candidates
- Higher specificity
- Lower degeneracy tolerance
- Better for complex genomes
Gap Consensus Impact
Gap Consensus Impact
With -g flag:
- Positions with gaps are excluded
- May reduce primer candidates
- Better for sequences with indels
- Gaps considered in consensus
- More primer candidates
- May include less reliable positions
Exit Codes
- 0: Successful completion
- Non-zero: Error occurred (check error messages)
See Also
- Running PROTÉGÉ PD - Docker container usage
- Input Files - FASTA format requirements
- Output Files - Understanding results