Skip to main content

Synopsis

protege-pd [-h] -s SEQ [-c CONS] [-g] [-d COD] [-v]

Description

PROTÉGÉ - PROTEin coding GEne for phylogenetic tag and identification. Useful design and visualization of primers. Based on the Phylotag approach (Caro-Quintero, 2015), PROTÉGÉ designs degenerate primers from protein-coding gene sequences for phylogenetic marker development.

Required Parameters

-s, --seq
string
required
Path to the FASTA file containing protein-coding gene sequences.Details:
  • Must be nucleotide sequences (DNA)
  • Sequences must be in-frame (coding sequences)
  • File must be in FASTA format
  • Path is relative to the container’s working directory (/root/)
Example:
protege-pd -s gyrB_genes.fasta

Optional Parameters

-c, --consensus
float
default:"90"
Consensus percentage threshold for primer design.Details:
  • Range: 0-100
  • Higher values = more stringent consensus (fewer degenerate bases)
  • Lower values = more permissive (more degenerate bases)
  • Determines when a position requires a degenerate nucleotide
How it works:
  • If a single nucleotide appears ≥ consensus%, it’s used as-is
  • If no single nucleotide reaches threshold, degenerate codes are used
  • Accumulates nucleotides until combined percentage exceeds threshold
Example:
# More stringent - 95% consensus
protege-pd -s genes.fasta -c 95

# More permissive - 80% consensus
protege-pd -s genes.fasta -c 80
-g, --nogapconsensus
flag
default:"true"
Flag to exclude positions with gaps from consensus calculation.Details:
  • Default behavior (flag NOT used): Gaps are considered in consensus
  • With flag (-g): Positions with any gaps are automatically assigned gap character
  • Useful when alignment has insertions/deletions that should be avoided
Example:
# Exclude positions with gaps from consensus
protege-pd -s genes.fasta -g

# Default: gaps considered in consensus calculation
protege-pd -s genes.fasta
Using -g may result in more gaps in the consensus sequence, potentially reducing the number of viable primer candidates.
-d, --codon
integer
default:"7"
Primer length specified in number of codons (multiply by 3 for nucleotide length).Details:
  • Specifies primer length in codon units
  • Actual nucleotide length = codon value × 3
  • Default: 7 codons = 21 nucleotides
  • Typical range: 5-12 codons (15-36 nucleotides)
Primer length guidelines:
  • Shorter (5-7 codons): Less specific, more degenerate options
  • Medium (7-9 codons): Balanced specificity and options
  • Longer (10-12 codons): More specific, fewer options
Example:
# Short primers: 5 codons = 15 nucleotides
protege-pd -s genes.fasta -d 5

# Default: 7 codons = 21 nucleotides
protege-pd -s genes.fasta -d 7

# Long primers: 10 codons = 30 nucleotides
protege-pd -s genes.fasta -d 10
-v, --verbose
flag
default:"false"
Enable verbose output for detailed processing information.Details:
  • Prints detailed alignment position analysis
  • Shows nucleotide frequencies at each position
  • Displays consensus decision-making process
  • Useful for troubleshooting and understanding results
Example:
# Enable verbose output
protege-pd -s genes.fasta -v
Verbose mode generates significant terminal output. Redirect to a file for easier analysis:
protege-pd -s genes.fasta -v > protege_log.txt 2>&1
-h, --help
flag
Display help message and exit.Example:
protege-pd --help

Complete Examples

Example 1: Basic Run with Defaults

protege-pd -s mysequences.fasta
Configuration:
  • Consensus: 90%
  • Gap consensus: enabled
  • Primer length: 7 codons (21 nt)
  • Verbose: disabled

Example 2: High-Stringency Primers

protege-pd -s conserved_genes.fasta -c 95 -d 8
Configuration:
  • Consensus: 95% (very stringent)
  • Primer length: 8 codons (24 nt)
  • Best for highly conserved genes

Example 3: Permissive with Gap Exclusion

protege-pd -s variable_genes.fasta -c 80 -g -v
Configuration:
  • Consensus: 80% (permissive)
  • Excludes gap positions
  • Verbose output enabled
  • Best for variable genes with indels

Example 4: Long Primers for Specificity

protege-pd -s target_gene.fasta -d 12 -c 92
Configuration:
  • Primer length: 12 codons (36 nt)
  • Consensus: 92%
  • Best for highly specific amplification

Example 5: Full Diagnostic Run

protege-pd -s diagnostic_set.fasta -c 85 -d 9 -v
Configuration:
  • Consensus: 85% (balanced)
  • Primer length: 9 codons (27 nt)
  • Verbose diagnostics enabled
  • Best for development and troubleshooting

Parameter Interactions

Higher consensus (-c 95):
  • Fewer degenerate primers
  • More positions may fail to meet threshold
  • May require more sequences to achieve consensus
Lower consensus (-c 80):
  • More degenerate primers (more IUPAC codes)
  • More positions meet threshold
  • Works with more variable sequences
Shorter primers (-d 5):
  • More primer candidates
  • Lower specificity
  • Higher degeneracy tolerance
Longer primers (-d 12):
  • Fewer primer candidates
  • Higher specificity
  • Lower degeneracy tolerance
  • Better for complex genomes
With -g flag:
  • Positions with gaps are excluded
  • May reduce primer candidates
  • Better for sequences with indels
Without -g flag (default):
  • Gaps considered in consensus
  • More primer candidates
  • May include less reliable positions

Exit Codes

  • 0: Successful completion
  • Non-zero: Error occurred (check error messages)

See Also

Build docs developers (and LLMs) love