Dockerfile Architecture
PROTÉGÉ PD’s Docker configuration is built on Ubuntu 22.04 and provides a complete Python 3.10 environment with all necessary bioinformatics dependencies.Base Image and Environment
The Dockerfile starts with Ubuntu 22.04 LTS and sets up three key environment variables:These environment variables define standard locations for source code (
/usr/local/src/), binaries (/usr/local/bin/), and the home directory (/root/).System Dependencies
The container installs essential system packages:- python3.10 - Required Python version for Biopython compatibility
- python3-pip - Package manager for Python dependencies
- git - Version control for cloning the repository
- vim - Text editor for container debugging
- htop - Process monitoring tool
- wget - Download utility for MUSCLE binary
MUSCLE Binary Installation
PROTÉGÉ PD requires MUSCLE (Multiple Sequence Comparison by Log-Expectation) for protein sequence alignment:The
muscle_lin binary is the Linux version of MUSCLE v3.8.31. The container makes it executable and places it in /usr/local/bin/ for global access.Application Setup
The Dockerfile clones the repository and installs all Python dependencies:- biopython 1.83 - Core bioinformatics library for sequence manipulation
- dash 2.14.2 - Web application framework for the GUI
- pandas 2.2.0 - Data manipulation and analysis
- plotly 5.18.0 - Interactive visualization library
- numpy 1.26.3 - Numerical computing
- scipy 1.12.0 - Scientific computing algorithms
- Flask 3.0.1 - Web server backend
Container Runtime Configuration
Port Mapping
The application runs a Dash web server on port 8050:Port Mapping Breakdown
Port Mapping Breakdown
- 127.0.0.1 - Binds to localhost only (security best practice)
- 8050 (first) - Host machine port
- 8050 (second) - Container internal port
Volume Mounts
PROTÉGÉ PD uses bind mounts to access your FASTA files:- type=bind - Creates a direct mount of a host directory
- source - Absolute path on your host machine (e.g.,
/home/user/data/) - target - Mount location inside container (
/root/.)
CPU Allocation
The--cpus flag limits CPU resources:
Recommended CPU allocation:
- Small datasets (under 50 sequences): 2 CPUs
- Medium datasets (50-200 sequences): 4 CPUs
- Large datasets (over 200 sequences): 6-8 CPUs
Container Cleanup
The--rm flag automatically removes the container after it stops:
Building Custom Images
You can build a custom image with modifications to the codebase:Clone and Modify
Build Custom Image
- -t protege-custom:latest - Tag the image with a name and version
- . - Build context (current directory)
Run Custom Image
Environment Variables
While PROTÉGÉ PD doesn’t require custom environment variables, you can pass them if needed:Environment variables are not currently used by the application but can be helpful for scripting and automation.
Container Resource Monitoring
Check Container Resources
While PROTÉGÉ PD is running, monitor resource usage:Access Container Shell
For debugging, you can access a shell inside the running container:Useful Container Commands
Useful Container Commands