Overview
The Structure class is the primary data structure for representing molecular structures in AlphaFold 3. It provides a comprehensive interface for working with protein, DNA, RNA, ligand, and other molecular entities.
Class Definition
from alphafold3.structure import Structure
The Structure class represents molecular structures with a hierarchical organization:
- Chains: Individual molecular chains (proteins, nucleic acids, ligands)
- Residues: Amino acids, nucleotides, or chemical components
- Atoms: Individual atomic coordinates and properties
- Bonds: Chemical bonds between atoms
Constructor
Structure(
*,
name: str = 'unset',
release_date: datetime.date | None = None,
resolution: float | None = None,
structure_method: str | None = None,
bioassembly_data: BioassemblyData | None = None,
chemical_components_data: ChemicalComponentsData | None = None,
chains: Chains,
residues: Residues,
atoms: Atoms,
bonds: Bonds,
skip_validation: bool = False
)
Name or identifier for the structure (e.g., PDB ID)
release_date
datetime.date | None
default:"None"
Release date of the structure
resolution
float | None
default:"None"
Resolution of the structure in Angstroms (for experimental structures)
Experimental method used (e.g., ‘x-ray diffraction’, ‘electron microscopy’)
bioassembly_data
BioassemblyData | None
default:"None"
Biological assembly information
chemical_components_data
ChemicalComponentsData | None
default:"None"
Chemical component dictionary data for non-standard residues
Table containing chain-level data (IDs, types, entity information)
Table containing residue-level data (names, IDs, author naming)
Table containing atomic data (names, elements, coordinates, b-factors)
Table containing bond connectivity information
Skip validation of foreign keys and table ordering (use with caution)
Properties
Structure name or identifier
Structure resolution in Angstroms
Experimental method used to determine the structure
Total number of atoms in the structure
Total number of chains in the structure
Number of models (for NMR structures or ensembles)
Tables
Access to the chains table with chain-level data
Access to the residues table with residue-level data
Access to the atoms table with atomic data
Access to the bonds table with connectivity information
Table of chains that have at least one resolved atom
Table of residues that have at least one resolved atom
Table of residues with no resolved atoms (missing from structure)
Atom-Level Arrays
Unique integer keys for each atom
Atom names (e.g., ‘CA’, ‘N’, ‘C’, ‘O’)
Element symbols (e.g., ‘C’, ‘N’, ‘O’, ‘S’)
X coordinates in Angstroms
Y coordinates in Angstroms
Z coordinates in Angstroms
B-factors (temperature factors) for each atom
Occupancy values for each atom (typically 1.0)
Chain ID for each atom (internal label_asym_id)
Chain type for each atom (e.g., ‘polypeptide(L)’, ‘polyribonucleotide’)
Author chain ID for each atom (from original PDB file)
Entity description for each atom
Residue sequence ID for each atom (1-indexed)
Residue names for each atom (e.g., ‘ALA’, ‘DA’, ‘U’, ‘HOH’)
Author residue sequence ID for each atom
Insertion codes for each atom
Masks
Boolean mask indicating protein atoms
Boolean mask indicating DNA atoms
Boolean mask indicating RNA atoms
Boolean mask indicating all nucleic acid atoms (DNA or RNA)
Boolean mask indicating ligand atoms
Boolean mask indicating water molecules
Methods
num_residues
def num_residues(*, count_unresolved: bool) -> int
Returns the number of residues in the structure.
If True, includes unresolved residues (missing from coordinates). If False, only counts residues with at least one resolved atom.
Returns: Number of residues in the structure
Example:
# Count only resolved residues
resolved_count = structure.num_residues(count_unresolved=False)
# Count all residues including missing ones
total_count = structure.num_residues(count_unresolved=True)
filter
def filter(
mask: np.ndarray | None = None,
*,
apply_per_element: bool = False,
invert: bool = False,
cascade_delete: CascadeDelete = CascadeDelete.CHAINS,
**predicate_by_field_name
) -> Structure
Filters the structure by field values and returns a new structure.
mask
np.ndarray | None
default:"None"
Optional boolean array with length equal to num_atoms for direct masking
Whether to apply predicates to each element individually or to the whole column array
If True, removes entities matching the predicates instead of retaining them
cascade_delete
CascadeDelete
default:"CascadeDelete.CHAINS"
Controls deletion of unresolved residues/chains:
FULL: Delete all unresolved residues and empty chains
CHAINS: Delete only chains with no resolved residues (default)
NONE: Keep all unresolved residues and chains
**predicate_by_field_name
Field-based filters using pattern <table>_<column> (e.g., chain_id='A', atom_name=('CA', 'N'))
Returns: Filtered Structure object
Example:
# Filter to chain A, backbone atoms only
filtered = structure.filter(
chain_id='A',
atom_name=('N', 'CA', 'C', 'O')
)
# Filter to residues 1-100
filtered = structure.filter(
res_id=lambda res_id: res_id <= 100
)
# Filter by multiple chains
filtered = structure.filter(
chain_id=('A', 'B', 'C')
)
# Filter using a boolean mask
low_bfactor_mask = structure.atom_b_factor < 50.0
filtered = structure.filter(low_bfactor_mask)
filter_out
def filter_out(*args, **kwargs) -> Structure
Removes entities matching the predicates (inverse of filter).
Same parameters as filter() but with invert=True applied automatically.
Example:
# Remove water molecules
no_water = structure.filter_out(res_name='HOH')
# Remove high B-factor atoms
filtered = structure.filter_out(
atom_b_factor=lambda b: b > 100.0
)
filter_to_entity_type
def filter_to_entity_type(
*,
protein: bool = True,
dna: bool = True,
rna: bool = True,
ligand: bool = True
) -> Structure
Filters structure to specific entity types.
Example:
# Get only protein chains
protein_only = structure.filter_to_entity_type(
protein=True,
dna=False,
rna=False,
ligand=False
)
# Get protein and nucleic acids only
no_ligands = structure.filter_to_entity_type(ligand=False)
to_mmcif
def to_mmcif(*, coords_decimal_places: int = 3) -> str
Converts the structure to an mmCIF format string.
Number of decimal places for atomic coordinates
Returns: String representation in mmCIF format
Example:
# Write structure to mmCIF file
mmcif_string = structure.to_mmcif()
with open('output.cif', 'w') as f:
f.write(mmcif_string)
# Higher precision coordinates
high_precision = structure.to_mmcif(coords_decimal_places=5)
to_mmcif_dict
def to_mmcif_dict(*, coords_decimal_places: int = 3) -> Mmcif
Returns an Mmcif (CifDict) object representing the structure.
Returns: Mmcif dictionary object
iter_atoms
def iter_atoms() -> Iterator[Mapping[str, Any]]
Iterates over all atoms in the structure, yielding dictionaries with atom, residue, and chain information.
Returns: Iterator of dictionaries with atom properties
Example:
for atom in structure.iter_atoms():
print(f"Chain {atom['chain_id']}, "
f"Residue {atom['res_name']} {atom['res_id']}, "
f"Atom {atom['atom_name']}: "
f"({atom['atom_x']}, {atom['atom_y']}, {atom['atom_z']})")
iter_residues
def iter_residues(include_unresolved: bool = False) -> Iterator[Mapping[str, Any]]
Iterates over residues in the structure.
If True, includes residues with no resolved atoms
Returns: Iterator of dictionaries with residue and chain properties
Example:
for residue in structure.iter_residues():
print(f"Chain {residue['chain_id']}: "
f"{residue['res_name']} {residue['res_id']}")
copy_and_update
def copy_and_update(**updates) -> Structure
Creates a copy of the structure with specified fields updated.
Keyword arguments for fields to update (e.g., name='new_name', atom_x=new_coords)
Returns: New Structure with updated fields
Example:
# Update structure name
renamed = structure.copy_and_update(name='my_protein')
# Update atomic coordinates
new_coords = structure.atom_x + 10.0 # Translate by 10 Angstroms
translated = structure.copy_and_update(atom_x=new_coords)
copy_and_update_coords
def copy_and_update_coords(coords: np.ndarray) -> Structure
Updates atomic coordinates with a new coordinate array.
New coordinates array with shape (..., num_atoms, 3) containing [x, y, z] coordinates
Returns: New Structure with updated coordinates
Example:
# Update all coordinates
new_coords = np.random.randn(structure.num_atoms, 3)
updated = structure.copy_and_update_coords(new_coords)
Factory Functions
from_mmcif
from alphafold3.structure import from_mmcif
structure = from_mmcif(
mmcif_string: str | bytes,
*,
name: str | None = None,
fix_mse_residues: bool = False,
fix_arginines: bool = False,
fix_unknown_dna: bool = False,
include_water: bool = False,
include_other: bool = False,
include_bonds: bool = False,
model_id: int | ModelID = ModelID.FIRST
) -> Structure
Constructs a Structure from an mmCIF string or file contents.
Contents of an mmCIF file
Optional structure name (defaults to mmCIF data_ field)
Convert selenomethionine (MSE) SE atoms to SD (sulphur)
Swap NH1/NH2 in arginine to ensure NH1 is closer to CD
Replace ‘N’ residue names in DNA chains with ‘DN’
Include water (HOH) molecules
Include non-standard entity types
Parse and include bond connectivity information
model_id
int | ModelID
default:"ModelID.FIRST"
Which model to parse (integer ID, ModelID.FIRST, or ModelID.ALL)
Returns: Structure object
Example:
from alphafold3.structure import from_mmcif
# Load from file
with open('structure.cif', 'r') as f:
mmcif_string = f.read()
structure = from_mmcif(
mmcif_string,
name='my_structure',
include_water=True,
include_bonds=True
)
print(f"Loaded structure with {structure.num_atoms} atoms")
from_res_arrays
from alphafold3.structure import from_res_arrays
structure = from_res_arrays(
atom_mask: np.ndarray,
**kwargs
) -> Structure
Creates a Structure from arrays with a residue dimension.
Shape (num_res, num_atom) indicating which atoms are present (nonzero = present)
Field name to values mapping. Arrays should have shape (num_res,) for chain/residue fields or (num_res, num_atom) for atom fields
Example:
from alphafold3.structure import from_res_arrays
import numpy as np
num_res = 100
num_atom = 37 # Standard atom types
atom_mask = np.random.rand(num_res, num_atom) > 0.5
atom_positions = np.random.randn(num_res, num_atom, 3)
res_names = np.array(['ALA'] * num_res)
structure = from_res_arrays(
atom_mask=atom_mask,
atom_x=atom_positions[..., 0],
atom_y=atom_positions[..., 1],
atom_z=atom_positions[..., 2],
res_name=res_names,
name='generated_structure'
)
CascadeDelete
from alphafold3.structure import CascadeDelete
class CascadeDelete(enum.Enum):
NONE = 0 # Keep all unresolved residues and chains
FULL = 1 # Delete all unresolved residues and empty chains
CHAINS = 2 # Delete only chains with no resolved residues (default)
Bond
from alphafold3.structure import Bond
class Bond(NamedTuple):
from_atom: Mapping[str, str | int | float | np.ndarray]
dest_atom: Mapping[str, str | int | float | np.ndarray]
bond_info: Mapping[str, str | int]
Represents a chemical bond between two atoms.
See Also