Specifying Covalent Bonds

AlphaFold 3 supports explicit specification of covalent bonds between atoms across different entities. This is essential for modeling covalent ligands, glycans, and other covalently-linked molecules.

Overview

Covalent bonds are defined in the bondedAtomPairs field as pairs of atoms, where each atom is uniquely identified by:

Chain ID (entity ID)
Residue ID (1-based position within the chain)
Atom Name (unique name within the residue)

{
  "bondedAtomPairs": [
    [["A", 145, "SG"], ["L", 1, "C04"]]
  ]
}

All bonds specified in bondedAtomPairs are implicitly covalent bonds. Other bond types are not currently supported.

Bond Specification Format

Atom Identification

Each atom is specified as a three-element tuple:

["chain_id", residue_id, "atom_name"]

Chain ID (String)

The entity identifier from the sequences section:

Must be an uppercase letter
Example: "A", "B", "L"

Residue ID (Integer)

1-based position within the chain:

For proteins/RNA/DNA: position in the sequence (1 = first residue)
For single-component ligands: always 1
For multi-component ligands: component position (1, 2, 3, …)

Atom Name (String)

Unique atom identifier within the residue:

For proteins/RNA/DNA: standard PDB atom names ("CA", "N", "SG")
For CCD ligands: atom names from the CCD definition
For custom CCD: atom names you defined in _chem_comp_atom.atom_id

Bond Pair Format

A bond is an array of two atoms:

[
  ["source_chain", source_residue, "source_atom"],
  ["dest_chain", dest_residue, "dest_atom"]
]

Use Cases

Use Case 1: Covalent Ligands

Binding a ligand to a protein residue:

{
  "name": "Covalent ligand example",
  "modelSeeds": [42],
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKTLTGKTITLEVEPS..."
      }
    },
    {
      "ligand": {
        "id": "L",
        "ccdCodes": ["HEM"]
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 145, "SG"], ["L", 1, "FE"]]
  ],
  "dialect": "alphafold3",
  "version": 4
}

This creates a bond between:

Chain A, residue 145 (cysteine), atom SG (sulfur)
Chain L, residue 1 (heme), atom FE (iron)

Use Case 2: Glycans

Defining multi-component glycans with internal bonds:

Linear Glycan
Branched Glycan

{
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKTNLTGK..."
      }
    },
    {
      "ligand": {
        "id": "G",
        "ccdCodes": ["NAG", "NAG", "BMA"],
        "description": "N-glycan core"
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 8, "ND2"], ["G", 1, "C1"]],
    [["G", 1, "O4"], ["G", 2, "C1"]],
    [["G", 2, "O4"], ["G", 3, "C1"]]
  ]
}

Bonds:

Protein Asn8 → First NAG
First NAG → Second NAG
Second NAG → BMA

{
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKTNLTGK..."
      }
    },
    {
      "ligand": {
        "id": "G",
        "ccdCodes": ["NAG", "NAG", "BMA", "MAN", "MAN"],
        "description": "Branched N-glycan"
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 8, "ND2"], ["G", 1, "C1"]],
    [["G", 1, "O4"], ["G", 2, "C1"]],
    [["G", 2, "O4"], ["G", 3, "C1"]],
    [["G", 3, "O3"], ["G", 4, "C1"]],
    [["G", 3, "O6"], ["G", 5, "C1"]]
  ]
}

Structure:

 ⋮
ALA            MAN (4)
 |              |
ASN ―― NAG ―― NAG ―― BMA ―― MAN (5)
 |      (1)    (2)    (3)
ALA
 ⋮

Use Case 3: Disulfide Bonds

Defining disulfide bridges between cysteines:

{
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKCTLTGKCITILEVEPCS..."
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 7, "SG"], ["A", 12, "SG"]],
    [["A", 23, "SG"], ["A", 42, "SG"]]
  ]
}

AlphaFold 3 may automatically detect some standard disulfide bonds. Explicitly defining them ensures they are modeled correctly.

Use Case 4: Cross-Chain Bonds

Bonds between different chains:

{
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKCTLTGK..."
      }
    },
    {
      "protein": {
        "id": "B",
        "sequence": "RDWHALECIDEV..."
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 7, "SG"], ["B", 8, "SG"]]
  ]
}

Restrictions

SMILES Ligands Cannot Be Bonded

Ligands defined using SMILES strings cannot participate in bonds because SMILES doesn’t provide unique atom names.

{
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKCTLTGK..."
      }
    },
    {
      "ligand": {
        "id": "L",
        "smiles": "CC(=O)OC1C[NH+]2CCC1CC2"
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 7, "SG"], ["L", 1, "???"]]  // No atom names in SMILES!
  ]
}

Polymer-Polymer Bonds Not Supported

Defining covalent bonds between or within polymer entities (protein-protein, RNA-RNA, etc.) is not currently supported, except for disulfide bridges.

Finding Atom Names

For Standard Residues

Use the RCSB PDB Chemical Component Dictionary:

Navigate to CCD

Visit https://www.rcsb.org/ligand/

Search for Component

Search for your component (e.g., “ATP”, “NAG”, “HEM”)

View Atom Names

View the 2D or 3D structure with atom labels to find the exact atom names

For Custom Ligands

Atom names are defined in your user-provided CCD:

loop_
_chem_comp_atom.comp_id
_chem_comp_atom.atom_id
_chem_comp_atom.type_symbol
MY-LIG C01 C
MY-LIG C02 C
MY-LIG C03 C
MY-LIG O04 O
MY-LIG N05 N

Use the atom_id values in your bonds.

Complete Example: N-Glycosylated Protein

{
  "name": "N-glycosylated protein",
  "modelSeeds": [42],
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKTNLTGKTITLEVEPS",
        "description": "Protein with asparagine at position 7"
      }
    },
    {
      "ligand": {
        "id": "G1",
        "ccdCodes": ["NAG", "NAG", "BMA", "MAN", "MAN"],
        "description": "N-glycan on first site"
      }
    },
    {
      "ligand": {
        "id": "G2",
        "ccdCodes": ["NAG"],
        "description": "Single NAG on second site"
      }
    },
    {
      "ligand": {
        "id": "L",
        "ccdCodes": ["HEM"],
        "description": "Heme cofactor"
      }
    }
  ],
  "bondedAtomPairs": [
    // First glycan tree
    [["A", 7, "ND2"], ["G1", 1, "C1"]],
    [["G1", 1, "O4"], ["G1", 2, "C1"]],
    [["G1", 2, "O4"], ["G1", 3, "C1"]],
    [["G1", 3, "O3"], ["G1", 4, "C1"]],
    [["G1", 3, "O6"], ["G1", 5, "C1"]],
    // Second glycan
    [["A", 15, "ND2"], ["G2", 1, "C1"]],
    // Covalent heme
    [["A", 5, "SG"], ["L", 1, "FE"]]
  ],
  "dialect": "alphafold3",
  "version": 4
}

Validation

AlphaFold 3 validates bonds during input parsing:

Chain IDs must exist in sequences
Residue IDs must be within valid range (1 to chain length)
No duplicate bonds allowed
SMILES ligands cannot be in bonds
All bonds are converted to tuples internally

Invalid bonds will cause the input to be rejected with a clear error message.

Visual Representation

{
  "bondedAtomPairs": [
    [["A", 145, "SG"], ["L", 1, "C04"]],
    [["I", 1, "O6"], ["I", 2, "C1"]]
  ]
}

Breakdown:

First bond: Chain A, residue 145, atom SG ↔ Chain L, residue 1, atom C04
- Cross-chain bond (protein to ligand)
- Covalent ligand attachment
Second bond: Chain I, residue 1, atom O6 ↔ Chain I, residue 2, atom C1
- Within-chain bond
- Connects two components of a multi-component ligand

Code Reference

From folding_input.py:958-959:

bonded_atom_pairs: Sequence[tuple[BondAtomId, BondAtomId]] | None = None
# BondAtomId: TypeAlias = tuple[str, int, str]

From folding_input.py:1189-1234:

if bonds := raw_json.get('bondedAtomPairs'):
  bonded_atom_pairs = []
  for bond in bonds:
    if len(bond) != 2:
      raise ValueError(f'Bond {bond} must have 2 atoms.')
    bond_beg, bond_end = bond
    if len(bond_beg) != 3 or not isinstance(bond_beg[0], str) ...
      raise ValueError(
        'Atom must have 3 components: '
        '(chain_id: str, res_id: int, atom_name: str).'
      )

Getting Started

Core Concepts

User Guides

Advanced Usage

Resources

Specifying Covalent Bonds

Specifying Covalent Bonds

Overview

Bond Specification Format

Atom Identification

Bond Pair Format

Use Cases

Use Case 1: Covalent Ligands

Use Case 2: Glycans

Use Case 3: Disulfide Bonds

Use Case 4: Cross-Chain Bonds

Restrictions

SMILES Ligands Cannot Be Bonded

Polymer-Polymer Bonds Not Supported

Finding Atom Names

For Standard Residues

For Custom Ligands

Complete Example: N-Glycosylated Protein

Validation

Visual Representation

Code Reference

Build docs developers (and LLMs) love

Getting Started

Core Concepts

User Guides

Advanced Usage

Resources

​Specifying Covalent Bonds

​Overview

​Bond Specification Format

​Atom Identification

​Bond Pair Format

​Use Cases

​Use Case 1: Covalent Ligands

​Use Case 2: Glycans

​Use Case 3: Disulfide Bonds

​Use Case 4: Cross-Chain Bonds

​Restrictions

​SMILES Ligands Cannot Be Bonded

​Polymer-Polymer Bonds Not Supported

​Finding Atom Names

​For Standard Residues

​For Custom Ligands

​Complete Example: N-Glycosylated Protein

​Validation

​Visual Representation

​Code Reference

Build docs developers (and LLMs) love

Specifying Covalent Bonds

Overview

Bond Specification Format

Atom Identification

Bond Pair Format

Use Cases

Use Case 1: Covalent Ligands

Use Case 2: Glycans

Use Case 3: Disulfide Bonds

Use Case 4: Cross-Chain Bonds

Restrictions

SMILES Ligands Cannot Be Bonded

Polymer-Polymer Bonds Not Supported

Finding Atom Names

For Standard Residues

For Custom Ligands

Complete Example: N-Glycosylated Protein

Validation

Visual Representation

Code Reference