Skip to main content

Quick Start Guide

This guide will help you run your first AlphaFold 3 predictions using various input types.
Before starting, ensure you have completed the installation and have obtained model parameters.

Basic Workflow

1

Create input JSON

Define your biomolecular structure prediction task
2

Run AlphaFold 3

Execute the Docker container with your input
3

Analyze outputs

Review predicted structures and confidence metrics

Example 1: Simple Protein Structure

Let’s start with a basic homodimer protein prediction.

Create Input File

Create $HOME/af_input/fold_input.json:
{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
The "id": ["A", "B"] specifies a homodimer with two copies of the same protein chain.

Run Prediction

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DB_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output

Expected Output

The prediction creates an output directory $HOME/af_output/2PV7/ containing:
  • 2PV7_model.cif - Top-ranked predicted structure
  • 2PV7_confidences.json - Full confidence metrics
  • 2PV7_summary_confidences.json - Summary confidence scores
  • 2PV7_data.json - Input with MSA and template data
  • ranking_scores.csv - Scores for all predictions
  • seed-1_sample-0/ through seed-1_sample-4/ - Individual sample predictions

Example 2: Protein-Ligand Complex

Predict a protein bound to an ATP molecule.
{
  "name": "protein_atp_complex",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MKVLWAALLVTFLAGCQAKVDQIAEGAVRKIEEELGAIAAAH"
      }
    },
    {
      "ligand": {
        "id": "L",
        "ccdCodes": ["ATP"]
      }
    }
  ],
  "modelSeeds": [1, 2, 3],
  "dialect": "alphafold3",
  "version": 1
}
Using multiple modelSeeds generates multiple predictions with different random seeds, improving confidence in results.

Example 3: RNA Structure with Modifications

Predict an RNA structure with modified nucleotides.
{
  "name": "modified_rna",
  "sequences": [
    {
      "rna": {
        "id": "R",
        "sequence": "AGCUAGCUAGCUAGCU",
        "modifications": [
          {"modificationType": "2MG", "basePosition": 1},
          {"modificationType": "5MC", "basePosition": 4}
        ]
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Example 4: Protein with Post-Translational Modifications

{
  "name": "modified_protein",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "PVLSCGEWQLVLHVWAKVEADVAGHGQDILIRLFK",
        "modifications": [
          {"ptmType": "HY3", "ptmPosition": 1},
          {"ptmType": "P1L", "ptmPosition": 5}
        ]
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
Modifications are specified using CCD codes and 1-based residue positions. The first residue in the example won’t be a proline (P) but HY3 instead.

Example 5: DNA-Protein Complex

{
  "name": "dna_protein_complex",
  "sequences": [
    {
      "protein": {
        "id": "P",
        "sequence": "MTEKLTSAELGTRGVGLAKVAADGYVPDEAVRKAL"
      }
    },
    {
      "dna": {
        "id": "D1",
        "sequence": "GACCTCT"
      }
    },
    {
      "dna": {
        "id": "D2",
        "sequence": "AGAGGTC"
      }
    }
  ],
  "modelSeeds": [1, 2],
  "dialect": "alphafold3",
  "version": 1
}

Example 6: Covalent Ligand with Bond Specification

For covalent ligands, you can specify bonds between atoms.
{
  "name": "covalent_ligand",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MKTIIALSYIFCLVFADYKDDDDK"
      }
    },
    {
      "ligand": {
        "id": "L",
        "ccdCodes": ["HEM"]
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 145, "SG"], ["L", 1, "FE"]]
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}
Bonds are specified as [[entity_id, residue_position, atom_name], [entity_id, residue_position, atom_name]]. Atom names must match CCD definitions.

Running Pipeline in Stages

You can split the pipeline into stages to optimize resource usage.

Stage 1: Data Pipeline Only (CPU)

Generate MSAs and templates without running inference:
docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <DB_DIR>:/root/public_databases \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --output_dir=/root/af_output \
    --norun_inference
This stage is CPU-only and doesn’t require a GPU. Run it on a cheaper CPU-only instance to save costs.

Stage 2: Inference Only (GPU)

Run inference using pre-computed MSAs and templates:
docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_output/job_name/job_name_data.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output \
    --norun_data_pipeline

Processing Multiple Inputs

Process a directory of JSON files:
docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DB_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --input_dir=/root/af_input \
    --model_dir=/root/models \
    --output_dir=/root/af_output

Understanding Output Files

Structure File (.cif)

The mmCIF file contains the predicted 3D coordinates:
  • Compatible with PyMOL, ChimeraX, and other structural biology tools
  • Contains all atoms for all chains
  • Includes B-factor values corresponding to pLDDT confidence

Confidence Metrics

{
  "ptm": 0.87,              // Predicted TM-score (0-1, higher is better)
  "iptm": 0.82,             // Interface TM-score (0-1)
  "ranking_score": 0.91,    // Overall ranking score
  "has_clash": false,       // Significant clashes detected?
  "fraction_disordered": 0.05
}

Interpreting Confidence Scores

pLDDT (per-atom)

90-100: Very high confidence 70-90: Confident 50-70: Low confidence <50: Very low confidence

pTM / ipTM

>0.8: High quality prediction 0.6-0.8: Gray zone (may or may not be correct) <0.6: Failed prediction >0.5: Overall fold might be similar to true structure

PAE (Predicted Aligned Error)

Low values (0-5 Å): High confidence in relative positions High values (>10 Å): Low confidence in relative positions Useful for identifying domains and interfaces

Ranking Score

Range: -100 to 1.5 Formula: 0.8×ipTM + 0.2×pTM + 0.5×disorder - 100×clash Use to rank predictions across multiple seeds

Advanced Options

Custom MSA and Templates

Provide your own MSA (in A3M format):
{
  "protein": {
    "id": "A",
    "sequence": "MKTAYIAKQRQ",
    "unpairedMsa": ">query\nMKTAYIAKQRQ\n>seq1\nMKT-YIAKQRQ\n>seq2\nMKTAYI-KQRQ\n",
    "pairedMsa": "",
    "templates": []
  }
}

Running MSA-Free

Predict structure without MSA search:
{
  "protein": {
    "id": "A",
    "sequence": "MKTAYIAKQRQ",
    "unpairedMsa": "",
    "pairedMsa": "",
    "templates": []
  }
}

Multiple Seeds for Better Sampling

{
  "name": "my_prediction",
  "modelSeeds": [1, 2, 3, 4, 5],
  "sequences": [...]
}
By default, AlphaFold 3 generates 5 samples per seed. With 5 seeds, you’ll get 25 total predictions to choose from.

Performance Tips

Use SSD for Databases

Place genetic databases on SSD or RAM-backed filesystem for 5-10x faster MSA search

Reuse MSA Data

For multiple predictions with the same chains, run data pipeline once and reuse the output JSON

Enable JAX Compilation Cache

Use --jax_compilation_cache_dir to avoid recompiling between runs

Optimize Bucket Sizes

Adjust --buckets flag to minimize recompilation for your typical input sizes

Common Issues

Out of Memory

# Enable unified memory in Dockerfile
ENV XLA_PYTHON_CLIENT_PREALLOCATE=false
ENV TF_FORCE_UNIFIED_MEMORY=true
ENV XLA_CLIENT_MEM_FRACTION=3.2

RDKit Conformer Generation Failed

If you see “Failed to construct RDKit reference structure”:
  1. Try increasing iterations: --conformer_max_iterations=10000
  2. Or provide a reference structure using user-provided CCD format

Invalid JSON

Ensure SMILES strings are properly escaped:
# Use jq to escape SMILES
jq -R . <<< 'CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO'

Next Steps

Input Documentation

Complete JSON format specification

Output Documentation

Detailed output format guide

Performance Tuning

Optimize for speed and throughput

Getting Help

If you encounter issues:
  1. Check Known Issues
  2. Search GitHub Issues
  3. Contact the team at [email protected]

Build docs developers (and LLMs) love