Skip to main content

Prerequisites

Before running AlphaFold 3 with Docker, ensure you have:
1

Linux System

AlphaFold 3 requires Linux. Other operating systems are not supported.
2

NVIDIA GPU

Compute Capability 8.0+ (A100, H100 recommended)
  • A100 80GB: Up to 5,120 tokens
  • H100 80GB: Up to 5,120 tokens
  • A100 40GB: Up to 4,352 tokens (with config changes)
3

System Resources

  • RAM: Minimum 64 GB (genetic search can use more)
  • Disk: Up to 1 TB for databases (SSD recommended)
  • CUDA: Version 12.6 on host machine
4

Docker Installed

Rootless Docker recommended. See installation section below.

Installation

Installing Docker

These instructions are for Ubuntu 22.04 LTS. Adjust for your distribution.
Add Docker’s official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
Add repository and install:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io \
  docker-buildx-plugin docker-compose-plugin

# Test installation
sudo docker run hello-world

Installing NVIDIA GPU Support

1

Install NVIDIA Drivers

sudo apt-get -y install alsa-utils ubuntu-drivers-common
sudo ubuntu-drivers install
sudo nvidia-smi --gpu-reset
nvidia-smi  # Verify installation
If you see “NVIDIA-SMI has failed”, reboot with sudo reboot now
2

Install NVIDIA Container Toolkit

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
systemctl --user restart docker
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
3

Verify GPU Access

docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi
You should see your GPU listed.

Obtaining AlphaFold 3 Source Code

git clone https://github.com/google-deepmind/alphafold3.git
cd alphafold3

Downloading Databases

Download size: ~252 GB compressed, ~630 GB uncompressed. Use SSD for best performance.
cd alphafold3
./fetch_databases.sh /path/to/databases
Do NOT use a subdirectory of the AlphaFold 3 repository. This would slow Docker builds.
Expected directory structure:
/path/to/databases/
├── mmcif_files/                  # ~200k PDB mmCIF files
├── bfd-first_non_consensus_sequences.fasta
├── mgy_clusters_2022_05.fa
├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta
├── pdb_seqres_2022_09_28.fasta
├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta
├── rnacentral_active_seq_id_90_cov_80_linclust.fasta
├── uniprot_all_2021_04.fa
└── uniref90_2022_05.fa

Obtaining Model Parameters

Model parameters require approval. Complete this form. Expect 2-3 business day response.
Download parameters to a directory (e.g., $HOME/af3_models).
Do NOT place in AlphaFold 3 repository directory. Store separately.

Building Docker Image

cd alphafold3
docker build -t alphafold3 -f docker/Dockerfile .
This creates an image with all Python dependencies and environment configuration.

Running AlphaFold 3

Basic Usage

Create an input JSON file (see Input Format) and save to $HOME/af_input/fold_input.json:
fold_input.json
{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
Run AlphaFold 3:
docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume $HOME/af3_models:/root/models \
    --volume /path/to/databases:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output

Directory Mounts Explained

af_input
volume
Input JSON files. Must be readable by container.
af_output
volume
Output directory for predictions. Must be writable.
models
volume
Model parameters from Google DeepMind.
public_databases
volume
Genetic databases for MSA and template search.
You may need to run chmod 755 $HOME/af_input $HOME/af_output to ensure proper permissions.

Multiple Database Directories

For optimal performance with SSD + HDD setup:
docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume $HOME/af3_models:/root/models \
    --volume /mnt/ssd/databases:/root/public_databases \
    --volume /mnt/hdd/databases:/root/public_databases_fallback \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --db_dir=/root/public_databases \
    --db_dir=/root/public_databases_fallback \
    --output_dir=/root/af_output
AlphaFold 3 checks SSD first, then falls back to slower storage.

Processing Multiple Inputs

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume $HOME/af3_models:/root/models \
    --volume /path/to/databases:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --input_dir=/root/af_input \
    --model_dir=/root/models \
    --output_dir=/root/af_output
Processes all .json files in the input directory.

Common Flags

Pipeline Control

--run_data_pipeline
boolean
default:"true"
Run genetic and template search (CPU-only, time-consuming)
--run_inference
boolean
default:"true"
Run model inference (requires GPU)
--norun_data_pipeline
flag
Skip data pipeline (requires pre-computed MSA/templates in input)
--norun_inference
flag
Skip inference (generates MSA/templates only)

Output Control

--output_dir
path
required
Directory for output files
--force_output_dir
flag
Overwrite existing output directory
--save_embeddings
boolean
default:"false"
Save single and pair embeddings (~6 GB for 5k tokens)
--save_distogram
boolean
default:"false"
Save distogram predictions (~3 GB for 5k tokens)

Performance Flags

--buckets
list
Compilation bucket sizes (e.g., 256,512,1024,2048,5120)
--jax_compilation_cache_dir
path
Directory for JAX compilation cache (avoids recompilation)

Database Paths

--db_dir
path
Database directory (can specify multiple times)

Running in Stages

For optimal resource utilization, run data pipeline and inference separately:

Stage 1: Data Pipeline (CPU-only)

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume /path/to/databases:/root/public_databases \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --db_dir=/root/public_databases \
    --output_dir=/root/af_output \
    --norun_inference
This generates <job>_data.json with MSAs and templates.

Stage 2: Inference (GPU required)

docker run -it \
    --volume $HOME/af_output:/root/af_output \
    --volume $HOME/af3_models:/root/models \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_output/<job>_data.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output \
    --norun_data_pipeline
This approach allows running genetic search on CPU-only machines, then inference on GPU machines.

Troubleshooting

Permission Errors

# Error: permission denied
chmod 755 $HOME/af_input $HOME/af_output

# Error: database permissions
sudo chmod 755 --recursive /path/to/databases

GPU Not Detected

# Verify host can see GPU
nvidia-smi

# Verify Docker can see GPU
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi

# Check NVIDIA Container Toolkit
systemctl --user status docker

Out of Memory

For inputs >5,120 tokens or GPUs with <80GB:Edit docker/Dockerfile and rebuild:
ENV XLA_PYTHON_CLIENT_PREALLOCATE=false
ENV TF_FORCE_UNIFIED_MEMORY=true
ENV XLA_CLIENT_MEM_FRACTION=3.2

Compilation Issues

For V100 or other Compute Capability 7.x GPUs:
ENV XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"

Getting Help

View All Flags

docker run alphafold3 python run_alphafold.py --help

Check Logs

Docker logs are displayed in real-time. Redirect to file:
docker run ... alphafold3 python run_alphafold.py ... 2>&1 | tee log.txt

Next Steps

Singularity

Run AlphaFold 3 with Singularity instead

Performance

Optimize for speed and throughput

Build docs developers (and LLMs) love