Overview
This guide walks you through the complete workflow from a FASTA file containing aligned gene sequences to selecting optimal primer pairs for phylogenetic analysis.Before starting, make sure you have installed Docker and pulled the PROTÉGÉ PD image.
Prepare your input file
PROTÉGÉ PD requires a FASTA file containing nucleotide sequences of protein-coding genes. Your file should:- Contain multiple sequences (typically from different species or strains)
- Use nucleotide sequences (not amino acids)
- Represent the same gene across all sequences
- Be in standard FASTA format
Sequences should be protein-coding genes. The tool will translate them to amino acids for alignment.
Run PROTÉGÉ PD
Navigate to your data directory
Open your terminal and navigate to the directory containing your FASTA file:
Run the Docker container
Execute the following command, replacing the placeholders with your actual values:Parameters explained:
--rm- Automatically remove the container when it exits--mount type=bind,source=/your/files/path/,target=/root/.- Mount your local directory into the container- Replace
/your/files/path/with the full absolute path to your FASTA file directory
- Replace
--name protege- Name the running container “protege”-p 127.0.0.1:8050:8050- Map port 8050 for the web interface--cpus 4- Allocate 4 CPU cores (adjust based on your system)-s myseqs.fna- Your FASTA file name (replacemyseqs.fnawith your actual filename)
Wait for processing
PROTÉGÉ PD will:
- Translate your nucleotide sequences to amino acids
- Align the sequences using MUSCLE
- Calculate consensus sequences with degeneracies
- Identify all possible primer positions
- Launch the web interface
Using the web interface
The interface provides interactive visualization for primer selection:Main scatter plot
The top scatter plot shows all primer candidates:- X-axis: Primer position in the alignment
- Y-axis: Number of degeneracies (log scale)
- Green dots: Forward primer candidates
Select a forward primer
Click on any green dot in the top plot to select a forward primer. The primer sequence and degeneracy information will appear below the plot.Select a reverse primer
After selecting a forward primer, the bottom scatter plot updates to show reverse primer candidates:- Red dots: Reverse primer candidates
- Click on a red dot to select your reverse primer
Melting temperature analysis
Once both primers are selected, the bottom panel shows melting temperature distributions for:- Forward primer: All degenerate variants
- Reverse primer: All degenerate variants
Melting temperature calculation methods
Melting temperature calculation methods
- Tm Wallace “Rule of thumb” (default): Simple calculation based on base composition (Tm = 4×GC + 2×AT)
- Approx 2 Based on GC content: Uses the formula Tm = 64.9 + 41×((GC - 16.4) / length)
- Approx 3 Based on GC content: BioPython’s GC-based method
- Nearest neighbor: Most accurate thermodynamic calculation using nearest-neighbor parameters
Advanced parameters
You can customize the analysis by adding optional parameters:Available parameters
| Parameter | Description | Default |
|---|---|---|
-s, --seq | FASTA file with gene sequences (required) | None |
-c, --consensus | Consensus percentage threshold (0-100) | 90 |
-g, --nogapconsensus | Do not consider consensus with gaps | True (gaps allowed) |
-d, --codon | Codon primer length (number of codons) | 7 |
-v, --verbose | Increase output verbosity | False |
- Consensus percentage: Higher values (e.g., 95) require more conservation, resulting in more degenerate primers
- Codon primer length: Determines the primer length in codons (7 codons = 21 nucleotides)
View all parameters
To see the complete help message:Export results
Once you’ve identified suitable primer pairs:- Click the “Download CSV” button in the web interface
- The CSV file contains all primer candidates with:
- Position in the alignment
- Degeneracies
- Forward primer sequences
- Reverse primer sequences
pd_protege_[timestamp].csv in your mounted directory.
Example workflow
Here’s a complete example using a hypothetical gyrB gene dataset:Stopping the container
To stop PROTÉGÉ PD:- Press
Ctrl+Cin the terminal where the container is running - The container will automatically remove itself (due to the
--rmflag)
Troubleshooting
Port 8050 already in use
Port 8050 already in use
If you see an error about port 8050 being in use, either:
- Stop any other services using port 8050
- Change the port mapping:
-p 127.0.0.1:8051:8050(then access via http://127.0.0.1:8051)
File not found errors
File not found errors
Make sure:
- You’re using the absolute path in the
sourceparameter - Your FASTA filename is spelled correctly
- The file is in the directory you’re mounting
Alignment fails
Alignment fails
If MUSCLE alignment fails:
- Check that your sequences are valid nucleotide sequences
- Ensure sequences represent protein-coding genes
- Verify there are no stop codons in your sequences
No primers found
No primers found
If no suitable primers are identified:
- Try lowering the consensus threshold with
-c 80or-c 75 - Adjust the codon length with
-d 6or-d 8 - Check that your sequences have sufficient overlap
Next steps
- Explore the interactive plots to understand degeneracy patterns
- Compare melting temperatures across different calculation methods
- Export multiple primer pairs and test them experimentally
- Adjust consensus thresholds to balance specificity and degeneracy