Prerequisites
Before running AlphaFold 3 with Docker, ensure you have:
Linux System
AlphaFold 3 requires Linux. Other operating systems are not supported.
NVIDIA GPU
Compute Capability 8.0+ (A100, H100 recommended)
A100 80GB: Up to 5,120 tokens
H100 80GB: Up to 5,120 tokens
A100 40GB: Up to 4,352 tokens (with config changes)
System Resources
RAM : Minimum 64 GB (genetic search can use more)
Disk : Up to 1 TB for databases (SSD recommended)
CUDA : Version 12.6 on host machine
Docker Installed
Rootless Docker recommended. See installation section below.
Installation
Installing Docker
These instructions are for Ubuntu 22.04 LTS. Adjust for your distribution.
Ubuntu 22.04
Rootless Setup
Add Docker’s official GPG key: sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
Add repository and install: echo \
"deb [arch=$( dpkg --print-architecture ) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$( . /etc/os-release && echo " $VERSION_CODENAME ") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io \
docker-buildx-plugin docker-compose-plugin
# Test installation
sudo docker run hello-world
Enable rootless Docker for better security: sudo apt-get install -y uidmap systemd-container
sudo machinectl shell $( whoami ) @ /bin/bash -c \
'dockerd-rootless-setuptool.sh install && \
sudo loginctl enable-linger $(whoami) && \
DOCKER_HOST=unix:///run/user/1001/docker.sock docker context use rootless'
Installing NVIDIA GPU Support
Install NVIDIA Drivers
sudo apt-get -y install alsa-utils ubuntu-drivers-common
sudo ubuntu-drivers install
sudo nvidia-smi --gpu-reset
nvidia-smi # Verify installation
If you see “NVIDIA-SMI has failed”, reboot with sudo reboot now
Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker --config= $HOME /.config/docker/daemon.json
systemctl --user restart docker
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
Verify GPU Access
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi
You should see your GPU listed.
Obtaining AlphaFold 3 Source Code
git clone https://github.com/google-deepmind/alphafold3.git
cd alphafold3
Downloading Databases
Download size : ~252 GB compressed, ~630 GB uncompressed. Use SSD for best performance.
cd alphafold3
./fetch_databases.sh /path/to/databases
Do NOT use a subdirectory of the AlphaFold 3 repository. This would slow Docker builds.
Expected directory structure:
/path/to/databases/
├── mmcif_files/ # ~200k PDB mmCIF files
├── bfd-first_non_consensus_sequences.fasta
├── mgy_clusters_2022_05.fa
├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta
├── pdb_seqres_2022_09_28.fasta
├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta
├── rnacentral_active_seq_id_90_cov_80_linclust.fasta
├── uniprot_all_2021_04.fa
└── uniref90_2022_05.fa
Obtaining Model Parameters
Model parameters require approval. Complete this form . Expect 2-3 business day response.
Download parameters to a directory (e.g., $HOME/af3_models).
Do NOT place in AlphaFold 3 repository directory. Store separately.
Building Docker Image
cd alphafold3
docker build -t alphafold3 -f docker/Dockerfile .
This creates an image with all Python dependencies and environment configuration.
Running AlphaFold 3
Basic Usage
Create an input JSON file (see Input Format ) and save to $HOME/af_input/fold_input.json:
{
"name" : "2PV7" ,
"sequences" : [
{
"protein" : {
"id" : [ "A" , "B" ],
"sequence" : "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds" : [ 1 ],
"dialect" : "alphafold3" ,
"version" : 1
}
Run AlphaFold 3:
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume $HOME /af3_models:/root/models \
--volume /path/to/databases:/root/public_databases \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output
Directory Mounts Explained
Input JSON files. Must be readable by container.
Output directory for predictions. Must be writable.
Model parameters from Google DeepMind.
Genetic databases for MSA and template search.
You may need to run chmod 755 $HOME/af_input $HOME/af_output to ensure proper permissions.
Multiple Database Directories
For optimal performance with SSD + HDD setup:
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume $HOME /af3_models:/root/models \
--volume /mnt/ssd/databases:/root/public_databases \
--volume /mnt/hdd/databases:/root/public_databases_fallback \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--db_dir=/root/public_databases_fallback \
--output_dir=/root/af_output
AlphaFold 3 checks SSD first, then falls back to slower storage.
Directory of JSONs
Batch Script
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume $HOME /af3_models:/root/models \
--volume /path/to/databases:/root/public_databases \
--gpus all \
alphafold3 \
python run_alphafold.py \
--input_dir=/root/af_input \
--model_dir=/root/models \
--output_dir=/root/af_output
Processes all .json files in the input directory. #!/bin/bash
for json_file in $HOME /af_input/*.json ; do
echo "Processing $json_file "
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume $HOME /af3_models:/root/models \
--volume /path/to/databases:/root/public_databases \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/$( basename $json_file ) \
--model_dir=/root/models \
--output_dir=/root/af_output
done
Common Flags
Pipeline Control
Run genetic and template search (CPU-only, time-consuming)
Run model inference (requires GPU)
Skip data pipeline (requires pre-computed MSA/templates in input)
Skip inference (generates MSA/templates only)
Output Control
Directory for output files
Overwrite existing output directory
Save single and pair embeddings (~6 GB for 5k tokens)
Save distogram predictions (~3 GB for 5k tokens)
Compilation bucket sizes (e.g., 256,512,1024,2048,5120)
--jax_compilation_cache_dir
Directory for JAX compilation cache (avoids recompilation)
Database Paths
Database directory (can specify multiple times)
Running in Stages
For optimal resource utilization, run data pipeline and inference separately:
Stage 1: Data Pipeline (CPU-only)
docker run -it \
--volume $HOME /af_input:/root/af_input \
--volume $HOME /af_output:/root/af_output \
--volume /path/to/databases:/root/public_databases \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--db_dir=/root/public_databases \
--output_dir=/root/af_output \
--norun_inference
This generates <job>_data.json with MSAs and templates.
Stage 2: Inference (GPU required)
docker run -it \
--volume $HOME /af_output:/root/af_output \
--volume $HOME /af3_models:/root/models \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_output/ < job > _data.json \
--model_dir=/root/models \
--output_dir=/root/af_output \
--norun_data_pipeline
This approach allows running genetic search on CPU-only machines, then inference on GPU machines.
Troubleshooting
Permission Errors
# Error: permission denied
chmod 755 $HOME /af_input $HOME /af_output
# Error: database permissions
sudo chmod 755 --recursive /path/to/databases
GPU Not Detected
# Verify host can see GPU
nvidia-smi
# Verify Docker can see GPU
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi
# Check NVIDIA Container Toolkit
systemctl --user status docker
Out of Memory
Enable Unified Memory
Reduce Batch Size
For inputs >5,120 tokens or GPUs with <80GB: Edit docker/Dockerfile and rebuild: ENV XLA_PYTHON_CLIENT_PREALLOCATE=false
ENV TF_FORCE_UNIFIED_MEMORY=true
ENV XLA_CLIENT_MEM_FRACTION=3.2
For A100 40GB, edit model configuration: pair_transition_shard_spec: Sequence[_Shape2DType] = (
( 2048 , None ),
( 3072 , 1024 ),
( None , 512 ),
)
Compilation Issues
For V100 or other Compute Capability 7.x GPUs:
ENV XLA_FLAGS= "--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
Getting Help
View All Flags docker run alphafold3 python run_alphafold.py --help
Check Logs Docker logs are displayed in real-time. Redirect to file: docker run ... alphafold3 python run_alphafold.py ... 2>&1 | tee log.txt
Next Steps
Singularity Run AlphaFold 3 with Singularity instead
Performance Optimize for speed and throughput