Overview
The template processing module searches for and featurises structural templates from the Protein Data Bank. Templates provide structural constraints that guide AlphaFold 3’s predictions, especially for proteins with known homologous structures.
Classes
Hit
Represents a single template hit from structure database search.
@dataclasses.dataclass(frozen=True, kw_only=True)
class Hit:
pdb_id: str
auth_chain_id: str
hmmsearch_sequence: str
structure_sequence: str
unresolved_res_indices: Sequence[int] | None
query_sequence: str
start_index: int
end_index: int
full_length: int
release_date: datetime.date
chain_poly_type: str
PDB ID of the hit (lowercase).
Author chain ID from the PDB structure.
Hit sequence as returned by hmmsearch in A3M format (may contain gaps and lowercase insertions).
Full sequence from the PDB structure.
unresolved_res_indices
Sequence[int] | None
required
0-based indices of unresolved residues in the structure. None if structure is unavailable.
The query sequence used for template search.
Start index of alignment relative to full PDB seqres sequence (0-based, inclusive).
End index of alignment relative to full PDB seqres sequence (0-based, exclusive).
Length of the full PDB seqres sequence.
Release date of the PDB structure.
Polymer type (PROTEIN_CHAIN, RNA_CHAIN, or DNA_CHAIN).
Properties:
0-based query index to hit structure index mapping. Handles realignment when seqres doesn’t match structure sequence.
Hit sequence with deletions uppercased and gaps removed.
output_templates_sequence
Final template sequence aligned to query (gaps represented as ’-’).
Ratio of hit sequence length to query length.
Ratio of aligned residues to query length.
Whether hit can be used as template (has resolved residues at alignment positions).
Full template name in format {pdb_id}_{auth_chain_id}.
Methods
keep
Determine if hit should be kept based on filtering criteria.
def keep(
self,
*,
release_date_cutoff: datetime.date | None,
max_subsequence_ratio: float | None,
min_hit_length: int | None,
min_align_ratio: float | None,
) -> bool
Maximum release date for templates. Hits with later dates are excluded.
Maximum length ratio for exact subsequences. Excludes hits that are exact subsequences of query and exceed this ratio (prevents ground truth leakage).
Minimum residue count. Excludes shorter hits.
Minimum ratio of aligned residues to query length. Excludes hits with fewer alignments.
True if hit passes all filters and has resolved residues, False otherwise.
Example:
import datetime
should_keep = hit.keep(
release_date_cutoff=datetime.date(2021, 1, 1),
max_subsequence_ratio=0.95,
min_hit_length=30,
min_align_ratio=0.25
)
if should_keep:
print(f"Keeping template: {hit.full_name}")
Templates
Container for template hits with featurisation and filtering capabilities.
@dataclasses.dataclass(init=False)
class Templates:
def __init__(
self,
*,
query_sequence: str,
hits: Sequence[Hit],
max_template_date: datetime.date,
structure_store: structure_stores.StructureStore,
query_release_date: datetime.date | None = None,
)
The query sequence for which templates were found.
Template hits found for the query.
Maximum template date for filtering (prevents test set leakage).
structure_store
structure_stores.StructureStore
required
Structure store for fetching template structures.
Release date of query structure. Used to ensure templates don’t leak future structural information.
Properties:
Template hits (immutable).
Query release date if provided.
Effective release date cutoff (minimum of max_template_date and query_release_date minus 60 days).
structures
Iterator[structure.Structure]
Iterator over unique template structures. Yields one Structure per unique PDB ID.
Class Methods
from_seq_and_a3m
Create templates by running hmmsearch against a custom MSA.
@classmethod
def from_seq_and_a3m(
cls,
*,
query_sequence: str,
msa_a3m: str,
max_template_date: datetime.date,
database_path: os.PathLike[str] | str,
hmmsearch_config: msa_config.HmmsearchConfig,
max_a3m_query_sequences: int | None,
structure_store: structure_stores.StructureStore,
filter_config: msa_config.TemplateFilterConfig | None = None,
query_release_date: datetime.date | None = None,
chain_poly_type: str = mmcif_names.PROTEIN_CHAIN,
) -> Self
MSA in A3M format used to create HMM profile for hmmsearch.
Maximum template release date (for training, prevents ground truth leakage).
database_path
os.PathLike[str] | str
required
Path to sequence database to search for templates.
hmmsearch_config
msa_config.HmmsearchConfig
required
Hmmsearch configuration.
Maximum MSA sequences to use for profile construction.
structure_store
structure_stores.StructureStore
required
Structure store to fetch template structures.
filter_config
msa_config.TemplateFilterConfig | None
Optional filtering configuration. More performant than constructing all templates then filtering.
Query release date for temporal filtering.
chain_poly_type
str
default:"mmcif_names.PROTEIN_CHAIN"
Polymer type of templates.
Templates object with hits initialized from structure store metadata and alignments.
Example:
import datetime
from alphafold3.data import templates, structure_stores, msa_config
from alphafold3.constants import mmcif_names
# Load structure store
store = structure_stores.PdbStructureStore(
pdb_dir="/data/pdb_mmcif",
obsolete_pdbs_path="/data/obsolete.dat"
)
# Configure hmmsearch
hmmsearch_cfg = msa_config.HmmsearchConfig(
hmmsearch_binary_path="/usr/bin/hmmsearch",
hmmbuild_binary_path="/usr/bin/hmmbuild",
e_value=0.0001,
alphabet="amino"
)
# Configure filtering
filter_cfg = msa_config.TemplateFilterConfig(
max_template_date=datetime.date(2021, 9, 30),
max_subsequence_ratio=0.95,
min_align_ratio=0.1,
min_hit_length=10,
deduplicate_sequences=True,
max_hits=20
)
# Create templates from MSA
templates_obj = templates.Templates.from_seq_and_a3m(
query_sequence="MKTAYIAKQRQISFVKSHFSRQLE",
msa_a3m=msa_a3m_string,
max_template_date=datetime.date(2021, 9, 30),
database_path="/data/pdb_seqres.txt",
hmmsearch_config=hmmsearch_cfg,
max_a3m_query_sequences=512,
structure_store=store,
filter_config=filter_cfg,
chain_poly_type=mmcif_names.PROTEIN_CHAIN
)
print(f"Found {templates_obj.num_hits} template hits")
from_hmmsearch_a3m
Create templates from hmmsearch results in A3M format.
@classmethod
def from_hmmsearch_a3m(
cls,
*,
query_sequence: str,
a3m: str,
max_template_date: datetime.date,
structure_store: structure_stores.StructureStore,
filter_config: msa_config.TemplateFilterConfig | None = None,
query_release_date: datetime.date | None = None,
chain_poly_type: str = mmcif_names.PROTEIN_CHAIN,
) -> Self
Hmmsearch results in A3M format containing template alignments and PDB codes.
Maximum template release date.
structure_store
structure_stores.StructureStore
required
Structure store to fetch templates.
filter_config
msa_config.TemplateFilterConfig | None
Optional filtering configuration.
chain_poly_type
str
default:"mmcif_names.PROTEIN_CHAIN"
Polymer type.
Templates object with hits from A3M.
Example:
# Parse hmmsearch output
hmmsearch_a3m = """>4pqx_A/2-217 [subseq from] mol:protein length:217 Free text
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK
>5g3r_A/1-55 [subseq from] mol:protein length:352
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQD-LSGAEK
"""
templates_obj = templates.Templates.from_hmmsearch_a3m(
query_sequence="MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK",
a3m=hmmsearch_a3m,
max_template_date=datetime.date(2021, 9, 30),
structure_store=store,
chain_poly_type=mmcif_names.PROTEIN_CHAIN
)
Instance Methods
filter
Return new Templates object with filtered hits.
def filter(
self,
*,
max_subsequence_ratio: float | None,
min_align_ratio: float | None,
min_hit_length: int | None,
deduplicate_sequences: bool,
max_hits: int | None,
) -> Self
Exclude hits that are exact subsequences of query exceeding this ratio.
Exclude hits where aligned residues are less than this proportion of query length.
Exclude hits with fewer residues than this.
Whether to exclude duplicate template sequences (keeps first occurrence).
Maximum number of hits to keep.
New Templates object with filtered hits.
Example:
filtered = templates_obj.filter(
max_subsequence_ratio=0.95,
min_align_ratio=0.1,
min_hit_length=20,
deduplicate_sequences=True,
max_hits=20
)
print(f"Filtered from {templates_obj.num_hits} to {filtered.num_hits} hits")
get_hits_with_structures
Get hits paired with their filtered Structure objects.
def get_hits_with_structures(self) -> Sequence[tuple[Hit, structure.Structure]]
return
Sequence[tuple[Hit, structure.Structure]]
List of (Hit, Structure) tuples. Each Structure is filtered to the hit’s chain.
Raises:
InvalidTemplateError: If hits haven’t been filtered before calling (contains invalid hits)
Example:
try:
hits_with_structs = filtered.get_hits_with_structures()
for hit, struc in hits_with_structs:
print(f"{hit.full_name}: {struc.num_atoms} atoms")
except templates.InvalidTemplateError as e:
print(f"Must filter hits first: {e}")
featurize
Featurise templates for model input.
def featurize(
self,
include_ligand_features: bool = True,
) -> TemplateFeatures
Whether to compute ligand features from template structures.
Dictionary mapping feature names to values:
template_aatype: Encoded residue types (int32 array)
template_all_atom_masks: Atom presence masks (float64 array)
template_all_atom_positions: Atom coordinates (float64 array)
template_domain_names: Template names (bytes objects)
template_release_date: Release dates (bytes objects)
template_sequence: Template sequences (bytes objects)
ligand_features: (if include_ligand_features=True) Nested dict of ligand features per chain
Raises:
InvalidTemplateError: If hits haven’t been filtered before featurization
Example:
try:
features = filtered.featurize(include_ligand_features=True)
print(f"Template shapes:")
print(f" aatype: {features['template_aatype'].shape}")
print(f" positions: {features['template_all_atom_positions'].shape}")
print(f" masks: {features['template_all_atom_masks'].shape}")
if 'ligand_features' in features:
print(f" ligand features: {len(features['ligand_features'])} chains")
except templates.InvalidTemplateError as e:
print(f"Must filter hits first: {e}")
Functions
run_hmmsearch_with_a3m
Run hmmsearch to find template hits using an MSA.
def run_hmmsearch_with_a3m(
*,
database_path: os.PathLike[str] | str,
hmmsearch_config: msa_config.HmmsearchConfig,
max_a3m_query_sequences: int | None,
a3m: str | None,
) -> str
database_path
os.PathLike[str] | str
required
Path to sequence database (e.g., PDB seqres).
hmmsearch_config
msa_config.HmmsearchConfig
required
Hmmsearch configuration.
Maximum MSA sequences to use for HMM profile construction. None uses all sequences.
MSA in A3M format. Used to build HMM profile.
Hmmsearch results in A3M format.
Example:
from alphafold3.data import templates, msa_config
hmmsearch_cfg = msa_config.HmmsearchConfig(
hmmsearch_binary_path="/usr/bin/hmmsearch",
hmmbuild_binary_path="/usr/bin/hmmbuild",
e_value=0.0001,
inc_e=None,
dom_e=None,
incdom_e=None,
alphabet="amino",
filter_f1=0.02,
filter_f2=0.001,
filter_f3=0.0001,
filter_max=False
)
hits_a3m = templates.run_hmmsearch_with_a3m(
database_path="/data/pdb_seqres.txt",
hmmsearch_config=hmmsearch_cfg,
max_a3m_query_sequences=512,
a3m=msa_a3m_string
)
print(f"Hmmsearch returned {len(hits_a3m.splitlines())} lines")
get_polymer_features
Extract polymer features from a template structure chain.
def get_polymer_features(
*,
chain: structure.Structure,
chain_poly_type: str,
query_sequence_length: int,
query_to_hit_mapping: Mapping[int, int],
) -> Mapping[str, Any]
chain
structure.Structure
required
Structure object filtered to a single polymer chain.
Polymer type (PROTEIN_CHAIN, RNA_CHAIN, or DNA_CHAIN).
Length of the query sequence.
query_to_hit_mapping
Mapping[int, int]
required
0-based query index to hit index mapping.
Dictionary with polymer features:
template_all_atom_positions: Atom coordinates aligned to query
template_all_atom_masks: Atom presence masks
template_sequence: Template sequence as bytes
template_aatype: Encoded residue types
template_domain_names: Template name as bytes
template_release_date: Release date as bytes
Raises:
ValueError: If structure doesn’t have a name, lacks release date, or contains multiple chains
Example:
from alphafold3 import structure
from alphafold3.data import templates
from alphafold3.constants import mmcif_names
# Load and filter structure to single chain
struc = structure.from_mmcif(
mmcif_string=mmcif_content,
fix_mse_residues=True,
fix_arginines=True,
include_water=False
)
chain_struc = struc.filter(chain_id="A")
# Extract features
features = templates.get_polymer_features(
chain=chain_struc,
chain_poly_type=mmcif_names.PROTEIN_CHAIN,
query_sequence_length=100,
query_to_hit_mapping=hit.query_to_hit_mapping
)
print(f"Atom positions shape: {features['template_all_atom_positions'].shape}")
package_template_features
Stack and package features from multiple template hits.
def package_template_features(
*,
hit_features: Sequence[Mapping[str, Any]],
include_ligand_features: bool,
) -> Mapping[str, Any]
hit_features
Sequence[Mapping[str, Any]]
required
List of feature dictionaries, one per hit.
Whether to include ligand features in output.
Dictionary with stacked polymer features and unstacked ligand features (if included).
Template Search Workflow
Complete workflow for finding and using templates:
import datetime
from alphafold3.data import templates, msa, structure_stores, msa_config
from alphafold3.constants import mmcif_names
# 1. Get MSA for query
query_seq = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK"
msa_result = msa.get_msa(
target_sequence=query_seq,
run_config=msa_run_config,
chain_poly_type=mmcif_names.PROTEIN_CHAIN
)
# 2. Convert MSA to A3M
msa_a3m = msa_result.to_a3m()
# 3. Search for templates
structure_store = structure_stores.PdbStructureStore(
pdb_dir="/data/pdb_mmcif",
obsolete_pdbs_path="/data/obsolete.dat"
)
templates_obj = templates.Templates.from_seq_and_a3m(
query_sequence=query_seq,
msa_a3m=msa_a3m,
max_template_date=datetime.date(2021, 9, 30),
database_path="/data/pdb_seqres.txt",
hmmsearch_config=hmmsearch_config,
max_a3m_query_sequences=512,
structure_store=structure_store,
chain_poly_type=mmcif_names.PROTEIN_CHAIN
)
print(f"Found {templates_obj.num_hits} initial hits")
# 4. Filter templates
filtered = templates_obj.filter(
max_subsequence_ratio=0.95,
min_align_ratio=0.1,
min_hit_length=20,
deduplicate_sequences=True,
max_hits=20
)
print(f"Kept {filtered.num_hits} hits after filtering")
# 5. Featurise for model input
template_features = filtered.featurize(include_ligand_features=True)
print("Template features ready for inference")
print(f" Shape: {template_features['template_all_atom_positions'].shape}")
Error Handling
from alphafold3.data import templates
try:
templates_obj = templates.Templates.from_hmmsearch_a3m(
query_sequence=query_seq,
a3m=hmmsearch_a3m,
max_template_date=max_date,
structure_store=store
)
filtered = templates_obj.filter(
max_subsequence_ratio=0.95,
min_align_ratio=0.1,
min_hit_length=20,
deduplicate_sequences=True,
max_hits=20
)
features = filtered.featurize()
except templates.HitDateError as e:
print(f"Template date error: {e}")
except templates.InvalidTemplateError as e:
print(f"Invalid template: {e}")
except ValueError as e:
print(f"Validation error: {e}")