Project Overview
PROTÉGÉ PD is an open-source tool for phylogenetic primer design based on the Phylotag approach. The project welcomes contributions from the bioinformatics and software development communities. Repository: https://github.com/ddelgadillod/ProtegePDMaintainer: Diego Delgadillo Duran
Contact: [email protected]
PROTÉGÉ PD stands for PROTEin coding GEne for phylogenetic tag and identification - Primer Design tool.
Ways to Contribute
Reporting Issues
Help improve PROTÉGÉ PD by reporting bugs, documentation errors, or unexpected behavior. Before submitting an issue:- Search existing issues to avoid duplicates
- Verify the issue with the latest version (
v1.0.2) - Collect relevant information (error messages, input files, system details)
Local Installation
Install MUSCLE binary:biopython==1.83- Sequence analysis and MUSCLE integrationdash==2.14.2- Web interface frameworkpandas==2.2.0- Data manipulationplotly==5.18.0- Interactive visualizationsnumpy==1.26.3- Numerical operationsscipy==1.12.0- Statistical computations
Running Locally
The
test_files/ directory contains sample datasets for development and testing. These range from small test files to larger gyrB gene datasets.Code Structure
Main Components
protege.py (/home/daytona/workspace/source/protege.py:1)
Main application file containing:
- Command-line argument parsing
- Sequence reading and translation
- MUSCLE alignment execution
- Consensus calculation algorithm
- Dash web application setup
- Interactive plotting callbacks
- Sequence translation: Lines 132-161
- MUSCLE alignment: Lines 179-183
- Consensus calculation: Lines 246-300
- Degeneracy computation: Lines 305-323
- Dash callbacks: Lines 584-823
/home/daytona/workspace/source/phl.py:1)
Helper module with primer analysis classes and functions:
-
primerDegclass (lines 16-193): Primer degeneracy calculationsprimerCheck(): Validate primer sequenceprimerNP(): Count possible primersprimerComb(): Generate all primer combinationsTmWallace(),TmAp2(),TmAp3(),TmNN(): Melting temperature calculations
-
Plotting functions:
posDegScatter(): Forward primer scatter plot (lines 196-231)zoomDegScatter(): Reverse primer scatter plot (lines 234-264)
-
Utility functions:
filterDF(): Filter by degeneracy range (lines 269-272)degEquivalent(): Convert nucleotide combinations to IUPAC codes (lines 321-333)
Directory Structure
-
Make your changes
- Write clean, documented code
- Follow existing code style
- Add comments for complex logic
-
Test your changes
-
Commit your changes
-
Push to your fork
-
Create a Pull Request
- Visit your fork on GitHub
- Click “New Pull Request”
- Provide clear description of changes
Pull Request Guidelines
PR description template:- Explain why, not what
- Document complex algorithms
- Reference research papers for scientific methods
Testing
Manual Testing
Test with provided datasets:Testing Checklist
Testing Checklist
- Application starts without errors
- MUSCLE alignment completes successfully
- Output files created (sequences.csv, alSequences.csv, protege_consensus.csv)
- Web interface loads at http://127.0.0.1:8050
- Scatter plots display data points
- Primer selection updates temperature distribution
- CSV download works
- No Python warnings or errors in console
- Tested on multiple datasets (small, medium, large)
- Parameter variations work as expected
Docker Testing
Test containerized version:Contribution Areas
High-Priority Improvements
Performance Optimization
Performance Optimization
Opportunities:
- Parallelize consensus calculation for large alignments
- Optimize degeneracy computation (currently O(n²))
- Implement caching for temperature calculations
- Add progress bars for long-running operations
- Use
multiprocessingfor CPU-bound tasks - Consider
numbafor numerical computations - Maintain backward compatibility with existing parameters
Enhanced Primer Analysis
Enhanced Primer Analysis
Opportunities:
- Add primer specificity checking (BLAST integration)
- Calculate GC clamps and secondary structures
- Implement primer dimer detection
- Add primer3 integration for comprehensive analysis
- Keep as optional features (don’t break core workflow)
- Consider external dependencies carefully
- Provide clear documentation for new features
Improved User Interface
Improved User Interface
Opportunities:
- Add primer pair validation visualization
- Implement alignment viewer
- Add export options (PDF, PNG for plots)
- Improve mobile responsiveness
- Use existing Plotly/Dash components
- Test on multiple browsers
- Maintain clean, minimal design aesthetic
Cross-Platform Support
Cross-Platform Support
Opportunities:
- Add MUSCLE binaries for macOS and Windows
- Create native installation options (pip package)
- Improve Windows path handling
- Add conda distribution
- Test on all target platforms
- Provide platform-specific documentation
- Maintain Docker as primary distribution method
Documentation Needs
- Tutorial videos - Walkthrough of complete workflow
- Use case examples - Real-world phylogenetic studies
- API documentation - For using modules programmatically
- Troubleshooting guide expansion - More edge cases
- Comparative analysis - PROTÉGÉ PD vs other primer design tools
Community Guidelines
Code of Conduct
- Be respectful and constructive in discussions
- Welcome newcomers and help them get started
- Focus on the scientific merit and technical quality
- Credit others’ contributions appropriately
- Maintain professional communication
Getting Help
For development questions:- Open a GitHub Discussion (preferred for general questions)
- Email maintainer: [email protected]
- Reference relevant code sections with line numbers
- Refer to Phylotag paper: Caro-Quintero, 2015
- Discuss phylogenetic primer design principles
- Share use cases and results
Recognition
Contributors will be recognized in:- GitHub contributors list
- Future release notes
- README acknowledgments section (for significant contributions)
All contributions, big or small, are valuable. Whether you fix a typo, report a bug, or implement a major feature, thank you for helping improve PROTÉGÉ PD!