Skip to main content

Overview

This guide walks you through the complete workflow from a FASTA file containing aligned gene sequences to selecting optimal primer pairs for phylogenetic analysis.
Before starting, make sure you have installed Docker and pulled the PROTÉGÉ PD image.

Prepare your input file

PROTÉGÉ PD requires a FASTA file containing nucleotide sequences of protein-coding genes. Your file should:
  • Contain multiple sequences (typically from different species or strains)
  • Use nucleotide sequences (not amino acids)
  • Represent the same gene across all sequences
  • Be in standard FASTA format
Sequences should be protein-coding genes. The tool will translate them to amino acids for alignment.
Example FASTA format:
>Species1_gyrB
ATGACCGATGCGATCGATCGATGGCATGCGATCGATCG...
>Species2_gyrB
ATGACCGATGCGATCGATCGATGGCATGCGATCGATCA...
>Species3_gyrB
ATGACCGATGCGATCGATCGATGGCATGCGATCGATCG...

Run PROTÉGÉ PD

1

Navigate to your data directory

Open your terminal and navigate to the directory containing your FASTA file:
cd /path/to/your/fasta/files
2

Run the Docker container

Execute the following command, replacing the placeholders with your actual values:
docker run --rm \
  --mount type=bind,source=/your/files/path/,target=/root/. \
  --name protege \
  -p 127.0.0.1:8050:8050 \
  --cpus 4 \
  ddelgadillo/protege_base:v1.0.2 \
  protege-pd -s myseqs.fna
Parameters explained:
  • --rm - Automatically remove the container when it exits
  • --mount type=bind,source=/your/files/path/,target=/root/. - Mount your local directory into the container
    • Replace /your/files/path/ with the full absolute path to your FASTA file directory
  • --name protege - Name the running container “protege”
  • -p 127.0.0.1:8050:8050 - Map port 8050 for the web interface
  • --cpus 4 - Allocate 4 CPU cores (adjust based on your system)
  • -s myseqs.fna - Your FASTA file name (replace myseqs.fna with your actual filename)
Use the absolute path for the source directory, not a relative path. For example:
  • Linux/Mac: /home/username/data/ or $(pwd) for current directory
  • Windows: C:/Users/username/data/
3

Wait for processing

PROTÉGÉ PD will:
  1. Translate your nucleotide sequences to amino acids
  2. Align the sequences using MUSCLE
  3. Calculate consensus sequences with degeneracies
  4. Identify all possible primer positions
  5. Launch the web interface
You’ll see progress messages in the terminal. Wait until you see:
Dash is running on http://0.0.0.0:8050/
4

Access the web interface

Open your web browser and navigate to:
http://127.0.0.1:8050
You should see the PROTÉGÉ PD interactive interface.

Using the web interface

The interface provides interactive visualization for primer selection:

Main scatter plot

The top scatter plot shows all primer candidates:
  • X-axis: Primer position in the alignment
  • Y-axis: Number of degeneracies (log scale)
  • Green dots: Forward primer candidates
Lower degeneracy values mean fewer primer variants, which is generally preferred. Hover over points to see the primer sequence.

Select a forward primer

Click on any green dot in the top plot to select a forward primer. The primer sequence and degeneracy information will appear below the plot.

Select a reverse primer

After selecting a forward primer, the bottom scatter plot updates to show reverse primer candidates:
  • Red dots: Reverse primer candidates
  • Click on a red dot to select your reverse primer

Melting temperature analysis

Once both primers are selected, the bottom panel shows melting temperature distributions for:
  • Forward primer: All degenerate variants
  • Reverse primer: All degenerate variants
You can select different Tm calculation methods:
  • Tm Wallace “Rule of thumb” (default): Simple calculation based on base composition (Tm = 4×GC + 2×AT)
  • Approx 2 Based on GC content: Uses the formula Tm = 64.9 + 41×((GC - 16.4) / length)
  • Approx 3 Based on GC content: BioPython’s GC-based method
  • Nearest neighbor: Most accurate thermodynamic calculation using nearest-neighbor parameters

Advanced parameters

You can customize the analysis by adding optional parameters:
docker run --rm \
  --mount type=bind,source=$(pwd),target=/root/. \
  --name protege \
  -p 127.0.0.1:8050:8050 \
  --cpus 4 \
  ddelgadillo/protege_base:v1.0.2 \
  protege-pd -s myseqs.fna -c 85 -d 8

Available parameters

ParameterDescriptionDefault
-s, --seqFASTA file with gene sequences (required)None
-c, --consensusConsensus percentage threshold (0-100)90
-g, --nogapconsensusDo not consider consensus with gapsTrue (gaps allowed)
-d, --codonCodon primer length (number of codons)7
-v, --verboseIncrease output verbosityFalse
  • Consensus percentage: Higher values (e.g., 95) require more conservation, resulting in more degenerate primers
  • Codon primer length: Determines the primer length in codons (7 codons = 21 nucleotides)

View all parameters

To see the complete help message:
docker run --rm \
  --mount type=bind,source=$(pwd),target=/root/. \
  --name protege \
  -p 127.0.0.1:8050:8050 \
  --cpus 4 \
  ddelgadillo/protege_base:v1.0.2 \
  protege-pd --help

Export results

Once you’ve identified suitable primer pairs:
  1. Click the “Download CSV” button in the web interface
  2. The CSV file contains all primer candidates with:
    • Position in the alignment
    • Degeneracies
    • Forward primer sequences
    • Reverse primer sequences
The file is saved as pd_protege_[timestamp].csv in your mounted directory.

Example workflow

Here’s a complete example using a hypothetical gyrB gene dataset:
# Navigate to data directory
cd ~/phylogenetics/gyrB_data/

# Run PROTÉGÉ PD with custom consensus threshold
docker run --rm \
  --mount type=bind,source=$(pwd),target=/root/. \
  --name protege \
  -p 127.0.0.1:8050:8050 \
  --cpus 4 \
  ddelgadillo/protege_base:v1.0.2 \
  protege-pd -s gyrB_genes.fasta -c 85

# Open browser to http://127.0.0.1:8050
# Select primers interactively
# Download CSV results

Stopping the container

To stop PROTÉGÉ PD:
  1. Press Ctrl+C in the terminal where the container is running
  2. The container will automatically remove itself (due to the --rm flag)
Alternatively, from another terminal:
docker stop protege

Troubleshooting

If you see an error about port 8050 being in use, either:
  • Stop any other services using port 8050
  • Change the port mapping: -p 127.0.0.1:8051:8050 (then access via http://127.0.0.1:8051)
Make sure:
  • You’re using the absolute path in the source parameter
  • Your FASTA filename is spelled correctly
  • The file is in the directory you’re mounting
If MUSCLE alignment fails:
  • Check that your sequences are valid nucleotide sequences
  • Ensure sequences represent protein-coding genes
  • Verify there are no stop codons in your sequences
If no suitable primers are identified:
  • Try lowering the consensus threshold with -c 80 or -c 75
  • Adjust the codon length with -d 6 or -d 8
  • Check that your sequences have sufficient overlap

Next steps

  • Explore the interactive plots to understand degeneracy patterns
  • Compare melting temperatures across different calculation methods
  • Export multiple primer pairs and test them experimentally
  • Adjust consensus thresholds to balance specificity and degeneracy
For best results, aim for primers with degeneracy values between 1 and 100, and ensure forward and reverse primers have similar melting temperatures.

Build docs developers (and LLMs) love