Skip to main content

Overview

The Structure class is the primary data structure for representing molecular structures in AlphaFold 3. It provides a comprehensive interface for working with protein, DNA, RNA, ligand, and other molecular entities.

Class Definition

from alphafold3.structure import Structure
The Structure class represents molecular structures with a hierarchical organization:
  • Chains: Individual molecular chains (proteins, nucleic acids, ligands)
  • Residues: Amino acids, nucleotides, or chemical components
  • Atoms: Individual atomic coordinates and properties
  • Bonds: Chemical bonds between atoms

Constructor

Structure(
    *,
    name: str = 'unset',
    release_date: datetime.date | None = None,
    resolution: float | None = None,
    structure_method: str | None = None,
    bioassembly_data: BioassemblyData | None = None,
    chemical_components_data: ChemicalComponentsData | None = None,
    chains: Chains,
    residues: Residues,
    atoms: Atoms,
    bonds: Bonds,
    skip_validation: bool = False
)
name
str
default:"'unset'"
Name or identifier for the structure (e.g., PDB ID)
release_date
datetime.date | None
default:"None"
Release date of the structure
resolution
float | None
default:"None"
Resolution of the structure in Angstroms (for experimental structures)
structure_method
str | None
default:"None"
Experimental method used (e.g., ‘x-ray diffraction’, ‘electron microscopy’)
bioassembly_data
BioassemblyData | None
default:"None"
Biological assembly information
chemical_components_data
ChemicalComponentsData | None
default:"None"
Chemical component dictionary data for non-standard residues
chains
Chains
required
Table containing chain-level data (IDs, types, entity information)
residues
Residues
required
Table containing residue-level data (names, IDs, author naming)
atoms
Atoms
required
Table containing atomic data (names, elements, coordinates, b-factors)
bonds
Bonds
required
Table containing bond connectivity information
skip_validation
bool
default:"False"
Skip validation of foreign keys and table ordering (use with caution)

Properties

Structure Metadata

name
str
Structure name or identifier
release_date
datetime.date | None
Structure release date
resolution
float | None
Structure resolution in Angstroms
structure_method
str | None
Experimental method used to determine the structure
num_atoms
int
Total number of atoms in the structure
num_chains
int
Total number of chains in the structure
num_models
int
Number of models (for NMR structures or ensembles)

Tables

chains_table
Chains
Access to the chains table with chain-level data
residues_table
Residues
Access to the residues table with residue-level data
atoms_table
Atoms
Access to the atoms table with atomic data
bonds_table
Bonds
Access to the bonds table with connectivity information
present_chains
Chains
Table of chains that have at least one resolved atom
present_residues
Residues
Table of residues that have at least one resolved atom
unresolved_residues
Residues
Table of residues with no resolved atoms (missing from structure)

Atom-Level Arrays

atom_key
np.ndarray
Unique integer keys for each atom
atom_name
np.ndarray
Atom names (e.g., ‘CA’, ‘N’, ‘C’, ‘O’)
atom_element
np.ndarray
Element symbols (e.g., ‘C’, ‘N’, ‘O’, ‘S’)
atom_x
np.ndarray
X coordinates in Angstroms
atom_y
np.ndarray
Y coordinates in Angstroms
atom_z
np.ndarray
Z coordinates in Angstroms
atom_b_factor
np.ndarray
B-factors (temperature factors) for each atom
atom_occupancy
np.ndarray
Occupancy values for each atom (typically 1.0)
chain_id
np.ndarray
Chain ID for each atom (internal label_asym_id)
chain_type
np.ndarray
Chain type for each atom (e.g., ‘polypeptide(L)’, ‘polyribonucleotide’)
chain_auth_asym_id
np.ndarray
Author chain ID for each atom (from original PDB file)
chain_entity_id
np.ndarray
Entity ID for each atom
chain_entity_desc
np.ndarray
Entity description for each atom
res_id
np.ndarray
Residue sequence ID for each atom (1-indexed)
res_name
np.ndarray
Residue names for each atom (e.g., ‘ALA’, ‘DA’, ‘U’, ‘HOH’)
res_auth_seq_id
np.ndarray
Author residue sequence ID for each atom
res_insertion_code
np.ndarray
Insertion codes for each atom

Masks

is_protein_mask
np.ndarray
Boolean mask indicating protein atoms
is_dna_mask
np.ndarray
Boolean mask indicating DNA atoms
is_rna_mask
np.ndarray
Boolean mask indicating RNA atoms
is_nucleic_mask
np.ndarray
Boolean mask indicating all nucleic acid atoms (DNA or RNA)
is_ligand_mask
np.ndarray
Boolean mask indicating ligand atoms
is_water_mask
np.ndarray
Boolean mask indicating water molecules

Methods

num_residues

def num_residues(*, count_unresolved: bool) -> int
Returns the number of residues in the structure.
count_unresolved
bool
required
If True, includes unresolved residues (missing from coordinates). If False, only counts residues with at least one resolved atom.
Returns: Number of residues in the structure Example:
# Count only resolved residues
resolved_count = structure.num_residues(count_unresolved=False)

# Count all residues including missing ones
total_count = structure.num_residues(count_unresolved=True)

filter

def filter(
    mask: np.ndarray | None = None,
    *,
    apply_per_element: bool = False,
    invert: bool = False,
    cascade_delete: CascadeDelete = CascadeDelete.CHAINS,
    **predicate_by_field_name
) -> Structure
Filters the structure by field values and returns a new structure.
mask
np.ndarray | None
default:"None"
Optional boolean array with length equal to num_atoms for direct masking
apply_per_element
bool
default:"False"
Whether to apply predicates to each element individually or to the whole column array
invert
bool
default:"False"
If True, removes entities matching the predicates instead of retaining them
cascade_delete
CascadeDelete
default:"CascadeDelete.CHAINS"
Controls deletion of unresolved residues/chains:
  • FULL: Delete all unresolved residues and empty chains
  • CHAINS: Delete only chains with no resolved residues (default)
  • NONE: Keep all unresolved residues and chains
**predicate_by_field_name
FilterPredicate
Field-based filters using pattern <table>_<column> (e.g., chain_id='A', atom_name=('CA', 'N'))
Returns: Filtered Structure object Example:
# Filter to chain A, backbone atoms only
filtered = structure.filter(
    chain_id='A',
    atom_name=('N', 'CA', 'C', 'O')
)

# Filter to residues 1-100
filtered = structure.filter(
    res_id=lambda res_id: res_id <= 100
)

# Filter by multiple chains
filtered = structure.filter(
    chain_id=('A', 'B', 'C')
)

# Filter using a boolean mask
low_bfactor_mask = structure.atom_b_factor < 50.0
filtered = structure.filter(low_bfactor_mask)

filter_out

def filter_out(*args, **kwargs) -> Structure
Removes entities matching the predicates (inverse of filter). Same parameters as filter() but with invert=True applied automatically. Example:
# Remove water molecules
no_water = structure.filter_out(res_name='HOH')

# Remove high B-factor atoms
filtered = structure.filter_out(
    atom_b_factor=lambda b: b > 100.0
)

filter_to_entity_type

def filter_to_entity_type(
    *,
    protein: bool = True,
    dna: bool = True,
    rna: bool = True,
    ligand: bool = True
) -> Structure
Filters structure to specific entity types.
protein
bool
default:"True"
Include protein chains
dna
bool
default:"True"
Include DNA chains
rna
bool
default:"True"
Include RNA chains
ligand
bool
default:"True"
Include ligand molecules
Example:
# Get only protein chains
protein_only = structure.filter_to_entity_type(
    protein=True,
    dna=False,
    rna=False,
    ligand=False
)

# Get protein and nucleic acids only
no_ligands = structure.filter_to_entity_type(ligand=False)

to_mmcif

def to_mmcif(*, coords_decimal_places: int = 3) -> str
Converts the structure to an mmCIF format string.
coords_decimal_places
int
default:"3"
Number of decimal places for atomic coordinates
Returns: String representation in mmCIF format Example:
# Write structure to mmCIF file
mmcif_string = structure.to_mmcif()
with open('output.cif', 'w') as f:
    f.write(mmcif_string)

# Higher precision coordinates
high_precision = structure.to_mmcif(coords_decimal_places=5)

to_mmcif_dict

def to_mmcif_dict(*, coords_decimal_places: int = 3) -> Mmcif
Returns an Mmcif (CifDict) object representing the structure. Returns: Mmcif dictionary object

iter_atoms

def iter_atoms() -> Iterator[Mapping[str, Any]]
Iterates over all atoms in the structure, yielding dictionaries with atom, residue, and chain information. Returns: Iterator of dictionaries with atom properties Example:
for atom in structure.iter_atoms():
    print(f"Chain {atom['chain_id']}, "
          f"Residue {atom['res_name']} {atom['res_id']}, "
          f"Atom {atom['atom_name']}: "
          f"({atom['atom_x']}, {atom['atom_y']}, {atom['atom_z']})")

iter_residues

def iter_residues(include_unresolved: bool = False) -> Iterator[Mapping[str, Any]]
Iterates over residues in the structure.
include_unresolved
bool
default:"False"
If True, includes residues with no resolved atoms
Returns: Iterator of dictionaries with residue and chain properties Example:
for residue in structure.iter_residues():
    print(f"Chain {residue['chain_id']}: "
          f"{residue['res_name']} {residue['res_id']}")

copy_and_update

def copy_and_update(**updates) -> Structure
Creates a copy of the structure with specified fields updated.
**updates
Keyword arguments for fields to update (e.g., name='new_name', atom_x=new_coords)
Returns: New Structure with updated fields Example:
# Update structure name
renamed = structure.copy_and_update(name='my_protein')

# Update atomic coordinates
new_coords = structure.atom_x + 10.0  # Translate by 10 Angstroms
translated = structure.copy_and_update(atom_x=new_coords)

copy_and_update_coords

def copy_and_update_coords(coords: np.ndarray) -> Structure
Updates atomic coordinates with a new coordinate array.
coords
np.ndarray
required
New coordinates array with shape (..., num_atoms, 3) containing [x, y, z] coordinates
Returns: New Structure with updated coordinates Example:
# Update all coordinates
new_coords = np.random.randn(structure.num_atoms, 3)
updated = structure.copy_and_update_coords(new_coords)

Factory Functions

from_mmcif

from alphafold3.structure import from_mmcif

structure = from_mmcif(
    mmcif_string: str | bytes,
    *,
    name: str | None = None,
    fix_mse_residues: bool = False,
    fix_arginines: bool = False,
    fix_unknown_dna: bool = False,
    include_water: bool = False,
    include_other: bool = False,
    include_bonds: bool = False,
    model_id: int | ModelID = ModelID.FIRST
) -> Structure
Constructs a Structure from an mmCIF string or file contents.
mmcif_string
str | bytes
required
Contents of an mmCIF file
name
str | None
default:"None"
Optional structure name (defaults to mmCIF data_ field)
fix_mse_residues
bool
default:"False"
Convert selenomethionine (MSE) SE atoms to SD (sulphur)
fix_arginines
bool
default:"False"
Swap NH1/NH2 in arginine to ensure NH1 is closer to CD
fix_unknown_dna
bool
default:"False"
Replace ‘N’ residue names in DNA chains with ‘DN’
include_water
bool
default:"False"
Include water (HOH) molecules
include_other
bool
default:"False"
Include non-standard entity types
include_bonds
bool
default:"False"
Parse and include bond connectivity information
model_id
int | ModelID
default:"ModelID.FIRST"
Which model to parse (integer ID, ModelID.FIRST, or ModelID.ALL)
Returns: Structure object Example:
from alphafold3.structure import from_mmcif

# Load from file
with open('structure.cif', 'r') as f:
    mmcif_string = f.read()

structure = from_mmcif(
    mmcif_string,
    name='my_structure',
    include_water=True,
    include_bonds=True
)

print(f"Loaded structure with {structure.num_atoms} atoms")

from_res_arrays

from alphafold3.structure import from_res_arrays

structure = from_res_arrays(
    atom_mask: np.ndarray,
    **kwargs
) -> Structure
Creates a Structure from arrays with a residue dimension.
atom_mask
np.ndarray
required
Shape (num_res, num_atom) indicating which atoms are present (nonzero = present)
**kwargs
Field name to values mapping. Arrays should have shape (num_res,) for chain/residue fields or (num_res, num_atom) for atom fields
Example:
from alphafold3.structure import from_res_arrays
import numpy as np

num_res = 100
num_atom = 37  # Standard atom types

atom_mask = np.random.rand(num_res, num_atom) > 0.5
atom_positions = np.random.randn(num_res, num_atom, 3)
res_names = np.array(['ALA'] * num_res)

structure = from_res_arrays(
    atom_mask=atom_mask,
    atom_x=atom_positions[..., 0],
    atom_y=atom_positions[..., 1],
    atom_z=atom_positions[..., 2],
    res_name=res_names,
    name='generated_structure'
)

CascadeDelete

from alphafold3.structure import CascadeDelete

class CascadeDelete(enum.Enum):
    NONE = 0   # Keep all unresolved residues and chains
    FULL = 1   # Delete all unresolved residues and empty chains
    CHAINS = 2 # Delete only chains with no resolved residues (default)

Bond

from alphafold3.structure import Bond

class Bond(NamedTuple):
    from_atom: Mapping[str, str | int | float | np.ndarray]
    dest_atom: Mapping[str, str | int | float | np.ndarray]
    bond_info: Mapping[str, str | int]
Represents a chemical bond between two atoms.

See Also

Build docs developers (and LLMs) love