Skip to main content

Entity Types

AlphaFold 3 supports four primary entity types in the sequences array:

Protein

Amino acid sequences with modifications and templates

RNA

RNA sequences with modifications and MSA

DNA

DNA sequences with modifications

Ligand

Small molecules via CCD codes or SMILES

Protein Entities

Proteins are the most feature-rich entity type, supporting MSA, templates, and modifications.

Basic Structure

{
  "protein": {
    "id": "A",
    "sequence": "PVLSCGEWQL",
    "modifications": [
      {"ptmType": "HY3", "ptmPosition": 1},
      {"ptmType": "P1L", "ptmPosition": 5}
    ],
    "description": "10-residue protein with 2 modifications",
    "unpairedMsa": null,
    "pairedMsa": null,
    "templates": []
  }
}

Field Specifications

id
string | array<string>
required
Uppercase letter(s) for unique chain ID. Use array for homomers: ["A", "B", "C"]
sequence
string
required
Amino acid sequence using 1-letter standard codes
modifications
array
Post-translational modifications. Each has ptmType (CCD code) and ptmPosition (1-based)
description
string
Optional textual description (version 4+)
unpairedMsa
string
A3M format MSA or empty string. Mutually exclusive with unpairedMsaPath
unpairedMsaPath
string
Path to A3M MSA file (absolute or relative to JSON)
pairedMsa
string
A3M format paired MSA. Recommended to use unpairedMsa instead
pairedMsaPath
string
Path to paired MSA file
templates
array
Structural templates in mmCIF format with alignment mappings

Homodimer Example

{
  "protein": {
    "id": ["A", "B"],
    "sequence": "MKLLVVSGGSGS",
    "description": "Homodimer with two copies"
  }
}

With Custom MSA

{
  "protein": {
    "id": "A",
    "sequence": "DEEP",
    "unpairedMsa": ">query\nDEEP\n>match1\nD--P\n>match2\nDD-P",
    "pairedMsa": "",
    "templates": []
  }
}

RNA Entities

RNA sequences support modifications and MSA.
{
  "rna": {
    "id": "E",
    "sequence": "AGCU",
    "modifications": [
      {"modificationType": "2MG", "basePosition": 1},
      {"modificationType": "5MC", "basePosition": 4}
    ],
    "description": "4-base RNA with modifications",
    "unpairedMsa": null
  }
}

Field Specifications

id
string | array<string>
required
Uppercase letter(s) for chain ID
sequence
string
required
RNA sequence using only A, C, G, U
modifications
array
Each has modificationType (CCD code) and basePosition (1-based)
unpairedMsa
string
A3M format MSA
unpairedMsaPath
string
Path to MSA file

DNA Entities

DNA sequences support modifications but not MSA or templates.
{
  "dna": {
    "id": "C",
    "sequence": "GACCTCT",
    "modifications": [
      {"modificationType": "6OG", "basePosition": 1},
      {"modificationType": "6MA", "basePosition": 2}
    ],
    "description": "7-base DNA strand"
  }
}

Field Specifications

id
string | array<string>
required
Uppercase letter(s) for chain ID
sequence
string
required
DNA sequence using only A, C, G, T
modifications
array
Each has modificationType (CCD code) and basePosition (1-based)

Ligand Entities

Ligands can be specified three ways:
{
  "ligand": {
    "id": ["F", "G", "H"],
    "ccdCodes": ["ATP"],
    "description": "Three ATP molecules"
  }
}
Use standard Chemical Component Dictionary codes. Supports covalent bonds to other entities.

Field Specifications

id
string | array<string>
required
Uppercase letter(s) for ligand ID
ccdCodes
array<string>
List of CCD codes (standard or custom). Mutually exclusive with smiles
smiles
string
SMILES definition. Mutually exclusive with ccdCodes

SMILES JSON Escaping

Backslashes in SMILES must be escaped. Use jq or Python to properly escape:
jq -R . <<< 'CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO'

Ions as Ligands

Ions are treated as ligands. For example, a magnesium ion:
{
  "ligand": {
    "id": "MG1",
    "ccdCodes": ["MG"]
  }
}

Structural Templates

Templates are only supported for proteins.
"templates": [
  {
    "mmcif": "data_template\n_entry.id template\n...",
    "queryIndices": [0, 1, 2, 4, 5, 6],
    "templateIndices": [0, 1, 2, 3, 4, 8]
  }
]
mmcif
string
Single-chain protein template in mmCIF format. Mutually exclusive with mmcifPath
mmcifPath
string
Path to mmCIF file (can be gzip, xz, or zstd compressed)
queryIndices
array<integer>
required
0-based indices in query sequence
templateIndices
array<integer>
required
0-based indices in template sequence (account for unresolved residues)
mmCIF files may have unresolved residues. These must be counted when specifying templateIndices.

Covalent Bonds

Define bonds between or within entities using bondedAtomPairs:
"bondedAtomPairs": [
  [["A", 145, "SG"], ["L", 1, "C04"]],
  [["J", 1, "O6"], ["J", 2, "C1"]]
]
Each bond is defined by two atoms: [entityId, residueId, atomName]
  • Entity ID: Chain ID from id field
  • Residue ID: 1-based position within chain
  • Atom name: From CCD definition

Glycan Example

{
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "...ASN..."
      }
    },
    {
      "ligand": {
        "id": "B",
        "ccdCodes": ["CMP1", "CMP2", "CMP3", "CMP4"]
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 42, "ND2"], ["B", 1, "C1"]],
    [["B", 1, "O4"], ["B", 2, "C1"]],
    [["B", 2, "O3"], ["B", 3, "C1"]],
    [["B", 2, "O6"], ["B", 4, "C1"]]
  ]
}

Complete Example

Here’s a comprehensive input demonstrating multiple entity types:
{
  "name": "Complex Structure",
  "modelSeeds": [10, 42],
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "PVLSCGEWQL",
        "modifications": [
          {"ptmType": "HY3", "ptmPosition": 1},
          {"ptmType": "P1L", "ptmPosition": 5}
        ],
        "description": "Protein with modifications"
      }
    },
    {
      "protein": {
        "id": "B",
        "sequence": "RPACQLW",
        "templates": []
      }
    },
    {
      "dna": {
        "id": "C",
        "sequence": "GACCTCT",
        "modifications": [
          {"modificationType": "6OG", "basePosition": 1}
        ]
      }
    },
    {
      "rna": {
        "id": "E",
        "sequence": "AGCU",
        "modifications": [
          {"modificationType": "2MG", "basePosition": 1}
        ]
      }
    },
    {
      "ligand": {
        "id": ["F", "G", "H"],
        "ccdCodes": ["ATP"]
      }
    },
    {
      "ligand": {
        "id": "Z",
        "smiles": "CC(=O)OC1C[NH+]2CCC1CC2"
      }
    }
  ],
  "bondedAtomPairs": [
    [["A", 1, "CA"], ["G", 1, "CHA"]]
  ],
  "dialect": "alphafold3",
  "version": 4
}

Next Steps

Input Format

Top-level structure overview

Output Format

Understanding prediction outputs

Build docs developers (and LLMs) love