Structure

Overview

The Structure class is the primary data structure for representing molecular structures in AlphaFold 3. It provides a comprehensive interface for working with protein, DNA, RNA, ligand, and other molecular entities.

Class Definition

from alphafold3.structure import Structure

The Structure class represents molecular structures with a hierarchical organization:

Chains: Individual molecular chains (proteins, nucleic acids, ligands)
Residues: Amino acids, nucleotides, or chemical components
Atoms: Individual atomic coordinates and properties
Bonds: Chemical bonds between atoms

Constructor

Structure(
    *,
    name: str = 'unset',
    release_date: datetime.date | None = None,
    resolution: float | None = None,
    structure_method: str | None = None,
    bioassembly_data: BioassemblyData | None = None,
    chemical_components_data: ChemicalComponentsData | None = None,
    chains: Chains,
    residues: Residues,
    atoms: Atoms,
    bonds: Bonds,
    skip_validation: bool = False
)

name

str

default:"'unset'"

Name or identifier for the structure (e.g., PDB ID)

release_date

datetime.date | None

default:"None"

Release date of the structure

resolution

float | None

default:"None"

Resolution of the structure in Angstroms (for experimental structures)

structure_method

str | None

default:"None"

Experimental method used (e.g., ‘x-ray diffraction’, ‘electron microscopy’)

bioassembly_data

BioassemblyData | None

default:"None"

Biological assembly information

chemical_components_data

ChemicalComponentsData | None

default:"None"

Chemical component dictionary data for non-standard residues

chains

Chains

required

Table containing chain-level data (IDs, types, entity information)

residues

Residues

required

Table containing residue-level data (names, IDs, author naming)

atoms

Atoms

required

Table containing atomic data (names, elements, coordinates, b-factors)

bonds

Bonds

required

Table containing bond connectivity information

skip_validation

bool

default:"False"

Skip validation of foreign keys and table ordering (use with caution)

Properties

Structure Metadata

name

str

Structure name or identifier

release_date

datetime.date | None

Structure release date

resolution

float | None

Structure resolution in Angstroms

structure_method

str | None

Experimental method used to determine the structure

num_atoms

int

Total number of atoms in the structure

num_chains

int

Total number of chains in the structure

num_models

int

Number of models (for NMR structures or ensembles)

Tables

chains_table

Chains

Access to the chains table with chain-level data

residues_table

Residues

Access to the residues table with residue-level data

atoms_table

Atoms

Access to the atoms table with atomic data

bonds_table

Bonds

Access to the bonds table with connectivity information

present_chains

Chains

Table of chains that have at least one resolved atom

present_residues

Residues

Table of residues that have at least one resolved atom

unresolved_residues

Residues

Table of residues with no resolved atoms (missing from structure)

Atom-Level Arrays

atom_key

np.ndarray

Unique integer keys for each atom

atom_name

np.ndarray

Atom names (e.g., ‘CA’, ‘N’, ‘C’, ‘O’)

atom_element

np.ndarray

Element symbols (e.g., ‘C’, ‘N’, ‘O’, ‘S’)

atom_x

np.ndarray

X coordinates in Angstroms

atom_y

np.ndarray

Y coordinates in Angstroms

atom_z

np.ndarray

Z coordinates in Angstroms

atom_b_factor

np.ndarray

B-factors (temperature factors) for each atom

atom_occupancy

np.ndarray

Occupancy values for each atom (typically 1.0)

chain_id

np.ndarray

Chain ID for each atom (internal label_asym_id)

chain_type

np.ndarray

Chain type for each atom (e.g., ‘polypeptide(L)’, ‘polyribonucleotide’)

chain_auth_asym_id

np.ndarray

Author chain ID for each atom (from original PDB file)

chain_entity_id

np.ndarray

Entity ID for each atom

chain_entity_desc

np.ndarray

Entity description for each atom

res_id

np.ndarray

Residue sequence ID for each atom (1-indexed)

res_name

np.ndarray

Residue names for each atom (e.g., ‘ALA’, ‘DA’, ‘U’, ‘HOH’)

res_auth_seq_id

np.ndarray

Author residue sequence ID for each atom

res_insertion_code

np.ndarray

Insertion codes for each atom

Masks

is_protein_mask

np.ndarray

Boolean mask indicating protein atoms

is_dna_mask

np.ndarray

Boolean mask indicating DNA atoms

is_rna_mask

np.ndarray

Boolean mask indicating RNA atoms

is_nucleic_mask

np.ndarray

Boolean mask indicating all nucleic acid atoms (DNA or RNA)

is_ligand_mask

np.ndarray

Boolean mask indicating ligand atoms

is_water_mask

np.ndarray

Boolean mask indicating water molecules

Methods

num_residues

def num_residues(*, count_unresolved: bool) -> int

Returns the number of residues in the structure.

count_unresolved

bool

required

If True, includes unresolved residues (missing from coordinates). If False, only counts residues with at least one resolved atom.

Returns: Number of residues in the structure Example:

# Count only resolved residues
resolved_count = structure.num_residues(count_unresolved=False)

# Count all residues including missing ones
total_count = structure.num_residues(count_unresolved=True)

filter

def filter(
    mask: np.ndarray | None = None,
    *,
    apply_per_element: bool = False,
    invert: bool = False,
    cascade_delete: CascadeDelete = CascadeDelete.CHAINS,
    **predicate_by_field_name
) -> Structure

Filters the structure by field values and returns a new structure.

mask

np.ndarray | None

default:"None"

Optional boolean array with length equal to num_atoms for direct masking

apply_per_element

bool

default:"False"

Whether to apply predicates to each element individually or to the whole column array

invert

bool

default:"False"

If True, removes entities matching the predicates instead of retaining them

cascade_delete

CascadeDelete

default:"CascadeDelete.CHAINS"

Controls deletion of unresolved residues/chains:

FULL: Delete all unresolved residues and empty chains
CHAINS: Delete only chains with no resolved residues (default)
NONE: Keep all unresolved residues and chains

**predicate_by_field_name

FilterPredicate

Field-based filters using pattern <table>_<column> (e.g., chain_id='A', atom_name=('CA', 'N'))

Returns: Filtered Structure object Example:

# Filter to chain A, backbone atoms only
filtered = structure.filter(
    chain_id='A',
    atom_name=('N', 'CA', 'C', 'O')
)

# Filter to residues 1-100
filtered = structure.filter(
    res_id=lambda res_id: res_id <= 100
)

# Filter by multiple chains
filtered = structure.filter(
    chain_id=('A', 'B', 'C')
)

# Filter using a boolean mask
low_bfactor_mask = structure.atom_b_factor < 50.0
filtered = structure.filter(low_bfactor_mask)

filter_out

def filter_out(*args, **kwargs) -> Structure

Removes entities matching the predicates (inverse of filter). Same parameters as filter() but with invert=True applied automatically. Example:

# Remove water molecules
no_water = structure.filter_out(res_name='HOH')

# Remove high B-factor atoms
filtered = structure.filter_out(
    atom_b_factor=lambda b: b > 100.0
)

filter_to_entity_type

def filter_to_entity_type(
    *,
    protein: bool = True,
    dna: bool = True,
    rna: bool = True,
    ligand: bool = True
) -> Structure

Filters structure to specific entity types.

protein

bool

default:"True"

Include protein chains

dna

bool

default:"True"

Include DNA chains

rna

bool

default:"True"

Include RNA chains

ligand

bool

default:"True"

Include ligand molecules

Example:

# Get only protein chains
protein_only = structure.filter_to_entity_type(
    protein=True,
    dna=False,
    rna=False,
    ligand=False
)

# Get protein and nucleic acids only
no_ligands = structure.filter_to_entity_type(ligand=False)

to_mmcif

def to_mmcif(*, coords_decimal_places: int = 3) -> str

Converts the structure to an mmCIF format string.

coords_decimal_places

int

default:"3"

Number of decimal places for atomic coordinates

Returns: String representation in mmCIF format Example:

# Write structure to mmCIF file
mmcif_string = structure.to_mmcif()
with open('output.cif', 'w') as f:
    f.write(mmcif_string)

# Higher precision coordinates
high_precision = structure.to_mmcif(coords_decimal_places=5)

to_mmcif_dict

def to_mmcif_dict(*, coords_decimal_places: int = 3) -> Mmcif

Returns an Mmcif (CifDict) object representing the structure. Returns: Mmcif dictionary object

iter_atoms

def iter_atoms() -> Iterator[Mapping[str, Any]]

Iterates over all atoms in the structure, yielding dictionaries with atom, residue, and chain information. Returns: Iterator of dictionaries with atom properties Example:

for atom in structure.iter_atoms():
    print(f"Chain {atom['chain_id']}, "
          f"Residue {atom['res_name']} {atom['res_id']}, "
          f"Atom {atom['atom_name']}: "
          f"({atom['atom_x']}, {atom['atom_y']}, {atom['atom_z']})")

iter_residues

def iter_residues(include_unresolved: bool = False) -> Iterator[Mapping[str, Any]]

Iterates over residues in the structure.

include_unresolved

bool

default:"False"

If True, includes residues with no resolved atoms

Returns: Iterator of dictionaries with residue and chain properties Example:

for residue in structure.iter_residues():
    print(f"Chain {residue['chain_id']}: "
          f"{residue['res_name']} {residue['res_id']}")

copy_and_update

def copy_and_update(**updates) -> Structure

Creates a copy of the structure with specified fields updated.

**updates

Keyword arguments for fields to update (e.g., name='new_name', atom_x=new_coords)

Returns: New Structure with updated fields Example:

# Update structure name
renamed = structure.copy_and_update(name='my_protein')

# Update atomic coordinates
new_coords = structure.atom_x + 10.0  # Translate by 10 Angstroms
translated = structure.copy_and_update(atom_x=new_coords)

copy_and_update_coords

def copy_and_update_coords(coords: np.ndarray) -> Structure

Updates atomic coordinates with a new coordinate array.

coords

np.ndarray

required

New coordinates array with shape (..., num_atoms, 3) containing [x, y, z] coordinates

Returns: New Structure with updated coordinates Example:

# Update all coordinates
new_coords = np.random.randn(structure.num_atoms, 3)
updated = structure.copy_and_update_coords(new_coords)

Factory Functions

from_mmcif

from alphafold3.structure import from_mmcif

structure = from_mmcif(
    mmcif_string: str | bytes,
    *,
    name: str | None = None,
    fix_mse_residues: bool = False,
    fix_arginines: bool = False,
    fix_unknown_dna: bool = False,
    include_water: bool = False,
    include_other: bool = False,
    include_bonds: bool = False,
    model_id: int | ModelID = ModelID.FIRST
) -> Structure

Constructs a Structure from an mmCIF string or file contents.

mmcif_string

str | bytes

required

Contents of an mmCIF file

name

str | None

default:"None"

Optional structure name (defaults to mmCIF data_ field)

fix_mse_residues

bool

default:"False"

Convert selenomethionine (MSE) SE atoms to SD (sulphur)

fix_arginines

bool

default:"False"

Swap NH1/NH2 in arginine to ensure NH1 is closer to CD

fix_unknown_dna

bool

default:"False"

Replace ‘N’ residue names in DNA chains with ‘DN’

include_water

bool

default:"False"

Include water (HOH) molecules

include_other

bool

default:"False"

Include non-standard entity types

include_bonds

bool

default:"False"

Parse and include bond connectivity information

model_id

int | ModelID

default:"ModelID.FIRST"

Which model to parse (integer ID, ModelID.FIRST, or ModelID.ALL)

Returns: Structure object Example:

from alphafold3.structure import from_mmcif

# Load from file
with open('structure.cif', 'r') as f:
    mmcif_string = f.read()

structure = from_mmcif(
    mmcif_string,
    name='my_structure',
    include_water=True,
    include_bonds=True
)

print(f"Loaded structure with {structure.num_atoms} atoms")

from_res_arrays

from alphafold3.structure import from_res_arrays

structure = from_res_arrays(
    atom_mask: np.ndarray,
    **kwargs
) -> Structure

Creates a Structure from arrays with a residue dimension.

atom_mask

np.ndarray

required

Shape (num_res, num_atom) indicating which atoms are present (nonzero = present)

**kwargs

Field name to values mapping. Arrays should have shape (num_res,) for chain/residue fields or (num_res, num_atom) for atom fields

Example:

from alphafold3.structure import from_res_arrays
import numpy as np

num_res = 100
num_atom = 37  # Standard atom types

atom_mask = np.random.rand(num_res, num_atom) > 0.5
atom_positions = np.random.randn(num_res, num_atom, 3)
res_names = np.array(['ALA'] * num_res)

structure = from_res_arrays(
    atom_mask=atom_mask,
    atom_x=atom_positions[..., 0],
    atom_y=atom_positions[..., 1],
    atom_z=atom_positions[..., 2],
    res_name=res_names,
    name='generated_structure'
)

CascadeDelete

from alphafold3.structure import CascadeDelete

class CascadeDelete(enum.Enum):
    NONE = 0   # Keep all unresolved residues and chains
    FULL = 1   # Delete all unresolved residues and empty chains
    CHAINS = 2 # Delete only chains with no resolved residues (default)

Bond

from alphafold3.structure import Bond

class Bond(NamedTuple):
    from_atom: Mapping[str, str | int | float | np.ndarray]
    dest_atom: Mapping[str, str | int | float | np.ndarray]
    bond_info: Mapping[str, str | int]

Represents a chemical bond between two atoms.

Core Modules

Data Processing

Model Components

Constants & Utilities

Structure

Overview

Class Definition

Constructor

Properties

Structure Metadata

Tables

Atom-Level Arrays

Masks

Methods

num_residues

filter

filter_out

filter_to_entity_type

to_mmcif

to_mmcif_dict

iter_atoms

iter_residues

copy_and_update

copy_and_update_coords

Factory Functions

from_mmcif

from_res_arrays

CascadeDelete

Bond

See Also

Build docs developers (and LLMs) love

Core Modules

Data Processing

Model Components

Structure

Constants & Utilities

​Overview

​Class Definition

​Constructor

​Properties

​Structure Metadata

​Tables

​Atom-Level Arrays

​Masks

​Methods

​num_residues

​filter

​filter_out

​filter_to_entity_type

​to_mmcif

​to_mmcif_dict

​iter_atoms

​iter_residues

​copy_and_update

​copy_and_update_coords

​Factory Functions

​from_mmcif

​from_res_arrays

​Related Types

​CascadeDelete

​Bond

​See Also

Build docs developers (and LLMs) love

Overview

Class Definition

Constructor

Properties

Structure Metadata

Tables

Atom-Level Arrays

Masks

Methods

num_residues

filter

filter_out

filter_to_entity_type

to_mmcif

to_mmcif_dict

iter_atoms

iter_residues

copy_and_update

copy_and_update_coords

Factory Functions

from_mmcif

from_res_arrays

Related Types

CascadeDelete

Bond

See Also