Quick Start Guide

This guide will help you run your first AlphaFold 3 predictions using various input types.

Before starting, ensure you have completed the installation and have obtained model parameters.

Basic Workflow

Create input JSON

Define your biomolecular structure prediction task

Run AlphaFold 3

Execute the Docker container with your input

Analyze outputs

Review predicted structures and confidence metrics

Example 1: Simple Protein Structure

Let’s start with a basic homodimer protein prediction.

Create Input File

Create $HOME/af_input/fold_input.json:

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

The "id": ["A", "B"] specifies a homodimer with two copies of the same protein chain.

Run Prediction

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DB_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output

Expected Output

The prediction creates an output directory $HOME/af_output/2PV7/ containing:

2PV7_model.cif - Top-ranked predicted structure
2PV7_confidences.json - Full confidence metrics
2PV7_summary_confidences.json - Summary confidence scores
2PV7_data.json - Input with MSA and template data
ranking_scores.csv - Scores for all predictions
seed-1_sample-0/ through seed-1_sample-4/ - Individual sample predictions

Example 2: Protein-Ligand Complex

Predict a protein bound to an ATP molecule.

{
  "name": "protein_atp_complex",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MKVLWAALLVTFLAGCQAKVDQIAEGAVRKIEEELGAIAAAH"
      }
    },
    {
      "ligand": {
        "id": "L",
        "ccdCodes": ["ATP"]
      }
    }
  ],
  "modelSeeds": [1, 2, 3],
  "dialect": "alphafold3",
  "version": 1
}

Using multiple modelSeeds generates multiple predictions with different random seeds, improving confidence in results.

Example 3: RNA Structure with Modifications

Predict an RNA structure with modified nucleotides.

{
  "name": "modified_rna",
  "sequences": [
    {
      "rna": {
        "id": "R",
        "sequence": "AGCUAGCUAGCUAGCU",
        "modifications": [
          {"modificationType": "2MG", "basePosition": 1},
          {"modificationType": "5MC", "basePosition": 4}
        ]
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Example 4: Protein with Post-Translational Modifications

{
  "name": "modified_protein",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "PVLSCGEWQLVLHVWAKVEADVAGHGQDILIRLFK",
        "modifications": [
          {"ptmType": "HY3", "ptmPosition": 1},
          {"ptmType": "P1L", "ptmPosition": 5}
        ]
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Modifications are specified using CCD codes and 1-based residue positions. The first residue in the example won’t be a proline (P) but HY3 instead.

Example 5: DNA-Protein Complex

{
  "name": "dna_protein_complex",
  "sequences": [
    {
      "protein": {
        "id": "P",
        "sequence": "MTEKLTSAELGTRGVGLAKVAADGYVPDEAVRKAL"
      }
    },
    {
      "dna": {
        "id": "D1",
        "sequence": "GACCTCT"
      }
    },
    {
      "dna": {
        "id": "D2",
        "sequence": "AGAGGTC"
      }
    }
  ],
  "modelSeeds": [1, 2],
  "dialect": "alphafold3",
  "version": 1
}

Example 6: Covalent Ligand with Bond Specification

For covalent ligands, you can specify bonds between atoms.

{
  "name": "covalent_ligand",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MKTIIALSYIFCLVFADYKDDDDK"
      }
    },
    {
      "ligand": {
        "id": "L",
        "ccdCodes": ["HEM"]
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 145, "SG"], ["L", 1, "FE"]]
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Bonds are specified as [[entity_id, residue_position, atom_name], [entity_id, residue_position, atom_name]]. Atom names must match CCD definitions.

Running Pipeline in Stages

You can split the pipeline into stages to optimize resource usage.

Stage 1: Data Pipeline Only (CPU)

Generate MSAs and templates without running inference:

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <DB_DIR>:/root/public_databases \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --output_dir=/root/af_output \
    --norun_inference

This stage is CPU-only and doesn’t require a GPU. Run it on a cheaper CPU-only instance to save costs.

Stage 2: Inference Only (GPU)

Run inference using pre-computed MSAs and templates:

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_output/job_name/job_name_data.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output \
    --norun_data_pipeline

Processing Multiple Inputs

Process a directory of JSON files:

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DB_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --input_dir=/root/af_input \
    --model_dir=/root/models \
    --output_dir=/root/af_output

Understanding Output Files

Structure File (`.cif`)

The mmCIF file contains the predicted 3D coordinates:

Compatible with PyMOL, ChimeraX, and other structural biology tools
Contains all atoms for all chains
Includes B-factor values corresponding to pLDDT confidence

Confidence Metrics

{
  "ptm": 0.87,              // Predicted TM-score (0-1, higher is better)
  "iptm": 0.82,             // Interface TM-score (0-1)
  "ranking_score": 0.91,    // Overall ranking score
  "has_clash": false,       // Significant clashes detected?
  "fraction_disordered": 0.05
}

Interpreting Confidence Scores

pLDDT (per-atom)

90-100: Very high confidence 70-90: Confident 50-70: Low confidence <50: Very low confidence

pTM / ipTM

>0.8: High quality prediction 0.6-0.8: Gray zone (may or may not be correct) <0.6: Failed prediction >0.5: Overall fold might be similar to true structure

PAE (Predicted Aligned Error)

Low values (0-5 Å): High confidence in relative positions High values (>10 Å): Low confidence in relative positions Useful for identifying domains and interfaces

Ranking Score

Range: -100 to 1.5 Formula: 0.8×ipTM + 0.2×pTM + 0.5×disorder - 100×clash Use to rank predictions across multiple seeds

Advanced Options

Custom MSA and Templates

Provide your own MSA (in A3M format):

{
  "protein": {
    "id": "A",
    "sequence": "MKTAYIAKQRQ",
    "unpairedMsa": ">query\nMKTAYIAKQRQ\n>seq1\nMKT-YIAKQRQ\n>seq2\nMKTAYI-KQRQ\n",
    "pairedMsa": "",
    "templates": []
  }
}

Running MSA-Free

Predict structure without MSA search:

{
  "protein": {
    "id": "A",
    "sequence": "MKTAYIAKQRQ",
    "unpairedMsa": "",
    "pairedMsa": "",
    "templates": []
  }
}

Multiple Seeds for Better Sampling

{
  "name": "my_prediction",
  "modelSeeds": [1, 2, 3, 4, 5],
  "sequences": [...]
}

By default, AlphaFold 3 generates 5 samples per seed. With 5 seeds, you’ll get 25 total predictions to choose from.

Performance Tips

Use SSD for Databases

Place genetic databases on SSD or RAM-backed filesystem for 5-10x faster MSA search

Reuse MSA Data

For multiple predictions with the same chains, run data pipeline once and reuse the output JSON

Enable JAX Compilation Cache

Use --jax_compilation_cache_dir to avoid recompiling between runs

Optimize Bucket Sizes

Adjust --buckets flag to minimize recompilation for your typical input sizes

Common Issues

Out of Memory

# Enable unified memory in Dockerfile
ENV XLA_PYTHON_CLIENT_PREALLOCATE=false
ENV TF_FORCE_UNIFIED_MEMORY=true
ENV XLA_CLIENT_MEM_FRACTION=3.2

RDKit Conformer Generation Failed

If you see “Failed to construct RDKit reference structure”:

Try increasing iterations: --conformer_max_iterations=10000
Or provide a reference structure using user-provided CCD format

Invalid JSON

Ensure SMILES strings are properly escaped:

# Use jq to escape SMILES
jq -R . <<< 'CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO'

Next Steps

Input Documentation

Complete JSON format specification

Output Documentation

Detailed output format guide

Performance Tuning

Optimize for speed and throughput

Getting Help

If you encounter issues:

Check Known Issues
Search GitHub Issues
Contact the team at [email protected]

Getting Started

Core Concepts

User Guides

Advanced Usage

Resources

​Quick Start Guide

​Basic Workflow

​Example 1: Simple Protein Structure

​Create Input File

​Run Prediction

​Expected Output

​Example 2: Protein-Ligand Complex

​Example 3: RNA Structure with Modifications

​Example 4: Protein with Post-Translational Modifications

​Example 5: DNA-Protein Complex

​Example 6: Covalent Ligand with Bond Specification

​Running Pipeline in Stages

​Stage 1: Data Pipeline Only (CPU)

​Stage 2: Inference Only (GPU)

​Processing Multiple Inputs

​Understanding Output Files

​Structure File (.cif)

​Confidence Metrics

​Interpreting Confidence Scores

pLDDT (per-atom)

pTM / ipTM

PAE (Predicted Aligned Error)

Ranking Score

​Advanced Options

​Custom MSA and Templates

​Running MSA-Free

​Multiple Seeds for Better Sampling

​Performance Tips

Use SSD for Databases

Reuse MSA Data

Enable JAX Compilation Cache

Optimize Bucket Sizes

​Common Issues

​Out of Memory

​RDKit Conformer Generation Failed

​Invalid JSON

​Next Steps

Input Documentation

Output Documentation

Performance Tuning

​Getting Help

Build docs developers (and LLMs) love

Quick Start Guide

Basic Workflow

Example 1: Simple Protein Structure

Create Input File

Run Prediction

Expected Output

Example 2: Protein-Ligand Complex

Example 3: RNA Structure with Modifications

Example 4: Protein with Post-Translational Modifications

Example 5: DNA-Protein Complex

Example 6: Covalent Ligand with Bond Specification

Running Pipeline in Stages

Stage 1: Data Pipeline Only (CPU)

Stage 2: Inference Only (GPU)

Processing Multiple Inputs

Understanding Output Files

Structure File (`.cif`)

Confidence Metrics

Interpreting Confidence Scores

Advanced Options

Custom MSA and Templates

Running MSA-Free

Multiple Seeds for Better Sampling

Performance Tips

Common Issues

Out of Memory

RDKit Conformer Generation Failed

Invalid JSON

Next Steps

Getting Help