Skip to main content

Overview

The template processing module searches for and featurises structural templates from the Protein Data Bank. Templates provide structural constraints that guide AlphaFold 3’s predictions, especially for proteins with known homologous structures.

Classes

Hit

Represents a single template hit from structure database search.
@dataclasses.dataclass(frozen=True, kw_only=True)
class Hit:
    pdb_id: str
    auth_chain_id: str
    hmmsearch_sequence: str
    structure_sequence: str
    unresolved_res_indices: Sequence[int] | None
    query_sequence: str
    start_index: int
    end_index: int
    full_length: int
    release_date: datetime.date
    chain_poly_type: str
pdb_id
str
required
PDB ID of the hit (lowercase).
auth_chain_id
str
required
Author chain ID from the PDB structure.
hmmsearch_sequence
str
required
Hit sequence as returned by hmmsearch in A3M format (may contain gaps and lowercase insertions).
structure_sequence
str
required
Full sequence from the PDB structure.
unresolved_res_indices
Sequence[int] | None
required
0-based indices of unresolved residues in the structure. None if structure is unavailable.
query_sequence
str
required
The query sequence used for template search.
start_index
int
required
Start index of alignment relative to full PDB seqres sequence (0-based, inclusive).
end_index
int
required
End index of alignment relative to full PDB seqres sequence (0-based, exclusive).
full_length
int
required
Length of the full PDB seqres sequence.
release_date
datetime.date
required
Release date of the PDB structure.
chain_poly_type
str
required
Polymer type (PROTEIN_CHAIN, RNA_CHAIN, or DNA_CHAIN).
Properties:
query_to_hit_mapping
Mapping[int, int]
0-based query index to hit structure index mapping. Handles realignment when seqres doesn’t match structure sequence.
matching_sequence
str
Hit sequence with deletions uppercased and gaps removed.
output_templates_sequence
str
Final template sequence aligned to query (gaps represented as ’-’).
length_ratio
float
Ratio of hit sequence length to query length.
align_ratio
float
Ratio of aligned residues to query length.
is_valid
bool
Whether hit can be used as template (has resolved residues at alignment positions).
full_name
str
Full template name in format {pdb_id}_{auth_chain_id}.

Methods

keep
Determine if hit should be kept based on filtering criteria.
def keep(
    self,
    *,
    release_date_cutoff: datetime.date | None,
    max_subsequence_ratio: float | None,
    min_hit_length: int | None,
    min_align_ratio: float | None,
) -> bool
release_date_cutoff
datetime.date | None
Maximum release date for templates. Hits with later dates are excluded.
max_subsequence_ratio
float | None
Maximum length ratio for exact subsequences. Excludes hits that are exact subsequences of query and exceed this ratio (prevents ground truth leakage).
min_hit_length
int | None
Minimum residue count. Excludes shorter hits.
min_align_ratio
float | None
Minimum ratio of aligned residues to query length. Excludes hits with fewer alignments.
return
bool
True if hit passes all filters and has resolved residues, False otherwise.
Example:
import datetime

should_keep = hit.keep(
    release_date_cutoff=datetime.date(2021, 1, 1),
    max_subsequence_ratio=0.95,
    min_hit_length=30,
    min_align_ratio=0.25
)

if should_keep:
    print(f"Keeping template: {hit.full_name}")

Templates

Container for template hits with featurisation and filtering capabilities.
@dataclasses.dataclass(init=False)
class Templates:
    def __init__(
        self,
        *,
        query_sequence: str,
        hits: Sequence[Hit],
        max_template_date: datetime.date,
        structure_store: structure_stores.StructureStore,
        query_release_date: datetime.date | None = None,
    )
query_sequence
str
required
The query sequence for which templates were found.
hits
Sequence[Hit]
required
Template hits found for the query.
max_template_date
datetime.date
required
Maximum template date for filtering (prevents test set leakage).
structure_store
structure_stores.StructureStore
required
Structure store for fetching template structures.
query_release_date
datetime.date | None
Release date of query structure. Used to ensure templates don’t leak future structural information.
Properties:
query_sequence
str
The query sequence.
hits
tuple[Hit, ...]
Template hits (immutable).
num_hits
int
Number of template hits.
query_release_date
datetime.date | None
Query release date if provided.
release_date_cutoff
datetime.date
Effective release date cutoff (minimum of max_template_date and query_release_date minus 60 days).
structures
Iterator[structure.Structure]
Iterator over unique template structures. Yields one Structure per unique PDB ID.

Class Methods

from_seq_and_a3m
Create templates by running hmmsearch against a custom MSA.
@classmethod
def from_seq_and_a3m(
    cls,
    *,
    query_sequence: str,
    msa_a3m: str,
    max_template_date: datetime.date,
    database_path: os.PathLike[str] | str,
    hmmsearch_config: msa_config.HmmsearchConfig,
    max_a3m_query_sequences: int | None,
    structure_store: structure_stores.StructureStore,
    filter_config: msa_config.TemplateFilterConfig | None = None,
    query_release_date: datetime.date | None = None,
    chain_poly_type: str = mmcif_names.PROTEIN_CHAIN,
) -> Self
query_sequence
str
required
Target polymer sequence.
msa_a3m
str
required
MSA in A3M format used to create HMM profile for hmmsearch.
max_template_date
datetime.date
required
Maximum template release date (for training, prevents ground truth leakage).
database_path
os.PathLike[str] | str
required
Path to sequence database to search for templates.
hmmsearch_config
msa_config.HmmsearchConfig
required
Hmmsearch configuration.
max_a3m_query_sequences
int | None
required
Maximum MSA sequences to use for profile construction.
structure_store
structure_stores.StructureStore
required
Structure store to fetch template structures.
filter_config
msa_config.TemplateFilterConfig | None
Optional filtering configuration. More performant than constructing all templates then filtering.
query_release_date
datetime.date | None
Query release date for temporal filtering.
chain_poly_type
str
default:"mmcif_names.PROTEIN_CHAIN"
Polymer type of templates.
return
Templates
Templates object with hits initialized from structure store metadata and alignments.
Example:
import datetime
from alphafold3.data import templates, structure_stores, msa_config
from alphafold3.constants import mmcif_names

# Load structure store
store = structure_stores.PdbStructureStore(
    pdb_dir="/data/pdb_mmcif",
    obsolete_pdbs_path="/data/obsolete.dat"
)

# Configure hmmsearch
hmmsearch_cfg = msa_config.HmmsearchConfig(
    hmmsearch_binary_path="/usr/bin/hmmsearch",
    hmmbuild_binary_path="/usr/bin/hmmbuild",
    e_value=0.0001,
    alphabet="amino"
)

# Configure filtering
filter_cfg = msa_config.TemplateFilterConfig(
    max_template_date=datetime.date(2021, 9, 30),
    max_subsequence_ratio=0.95,
    min_align_ratio=0.1,
    min_hit_length=10,
    deduplicate_sequences=True,
    max_hits=20
)

# Create templates from MSA
templates_obj = templates.Templates.from_seq_and_a3m(
    query_sequence="MKTAYIAKQRQISFVKSHFSRQLE",
    msa_a3m=msa_a3m_string,
    max_template_date=datetime.date(2021, 9, 30),
    database_path="/data/pdb_seqres.txt",
    hmmsearch_config=hmmsearch_cfg,
    max_a3m_query_sequences=512,
    structure_store=store,
    filter_config=filter_cfg,
    chain_poly_type=mmcif_names.PROTEIN_CHAIN
)

print(f"Found {templates_obj.num_hits} template hits")
from_hmmsearch_a3m
Create templates from hmmsearch results in A3M format.
@classmethod
def from_hmmsearch_a3m(
    cls,
    *,
    query_sequence: str,
    a3m: str,
    max_template_date: datetime.date,
    structure_store: structure_stores.StructureStore,
    filter_config: msa_config.TemplateFilterConfig | None = None,
    query_release_date: datetime.date | None = None,
    chain_poly_type: str = mmcif_names.PROTEIN_CHAIN,
) -> Self
query_sequence
str
required
Target polymer sequence.
a3m
str
required
Hmmsearch results in A3M format containing template alignments and PDB codes.
max_template_date
datetime.date
required
Maximum template release date.
structure_store
structure_stores.StructureStore
required
Structure store to fetch templates.
filter_config
msa_config.TemplateFilterConfig | None
Optional filtering configuration.
query_release_date
datetime.date | None
Query release date.
chain_poly_type
str
default:"mmcif_names.PROTEIN_CHAIN"
Polymer type.
return
Templates
Templates object with hits from A3M.
Example:
# Parse hmmsearch output
hmmsearch_a3m = """>4pqx_A/2-217 [subseq from] mol:protein length:217 Free text
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK
>5g3r_A/1-55 [subseq from] mol:protein length:352
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQD-LSGAEK
"""

templates_obj = templates.Templates.from_hmmsearch_a3m(
    query_sequence="MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK",
    a3m=hmmsearch_a3m,
    max_template_date=datetime.date(2021, 9, 30),
    structure_store=store,
    chain_poly_type=mmcif_names.PROTEIN_CHAIN
)

Instance Methods

filter
Return new Templates object with filtered hits.
def filter(
    self,
    *,
    max_subsequence_ratio: float | None,
    min_align_ratio: float | None,
    min_hit_length: int | None,
    deduplicate_sequences: bool,
    max_hits: int | None,
) -> Self
max_subsequence_ratio
float | None
Exclude hits that are exact subsequences of query exceeding this ratio.
min_align_ratio
float | None
Exclude hits where aligned residues are less than this proportion of query length.
min_hit_length
int | None
Exclude hits with fewer residues than this.
deduplicate_sequences
bool
required
Whether to exclude duplicate template sequences (keeps first occurrence).
max_hits
int | None
Maximum number of hits to keep.
return
Templates
New Templates object with filtered hits.
Example:
filtered = templates_obj.filter(
    max_subsequence_ratio=0.95,
    min_align_ratio=0.1,
    min_hit_length=20,
    deduplicate_sequences=True,
    max_hits=20
)

print(f"Filtered from {templates_obj.num_hits} to {filtered.num_hits} hits")
get_hits_with_structures
Get hits paired with their filtered Structure objects.
def get_hits_with_structures(self) -> Sequence[tuple[Hit, structure.Structure]]
return
Sequence[tuple[Hit, structure.Structure]]
List of (Hit, Structure) tuples. Each Structure is filtered to the hit’s chain.
Raises:
  • InvalidTemplateError: If hits haven’t been filtered before calling (contains invalid hits)
Example:
try:
    hits_with_structs = filtered.get_hits_with_structures()
    for hit, struc in hits_with_structs:
        print(f"{hit.full_name}: {struc.num_atoms} atoms")
except templates.InvalidTemplateError as e:
    print(f"Must filter hits first: {e}")
featurize
Featurise templates for model input.
def featurize(
    self,
    include_ligand_features: bool = True,
) -> TemplateFeatures
include_ligand_features
bool
default:"True"
Whether to compute ligand features from template structures.
return
TemplateFeatures
Dictionary mapping feature names to values:
  • template_aatype: Encoded residue types (int32 array)
  • template_all_atom_masks: Atom presence masks (float64 array)
  • template_all_atom_positions: Atom coordinates (float64 array)
  • template_domain_names: Template names (bytes objects)
  • template_release_date: Release dates (bytes objects)
  • template_sequence: Template sequences (bytes objects)
  • ligand_features: (if include_ligand_features=True) Nested dict of ligand features per chain
Raises:
  • InvalidTemplateError: If hits haven’t been filtered before featurization
Example:
try:
    features = filtered.featurize(include_ligand_features=True)
    
    print(f"Template shapes:")
    print(f"  aatype: {features['template_aatype'].shape}")
    print(f"  positions: {features['template_all_atom_positions'].shape}")
    print(f"  masks: {features['template_all_atom_masks'].shape}")
    
    if 'ligand_features' in features:
        print(f"  ligand features: {len(features['ligand_features'])} chains")
except templates.InvalidTemplateError as e:
    print(f"Must filter hits first: {e}")

Functions

run_hmmsearch_with_a3m

Run hmmsearch to find template hits using an MSA.
def run_hmmsearch_with_a3m(
    *,
    database_path: os.PathLike[str] | str,
    hmmsearch_config: msa_config.HmmsearchConfig,
    max_a3m_query_sequences: int | None,
    a3m: str | None,
) -> str
database_path
os.PathLike[str] | str
required
Path to sequence database (e.g., PDB seqres).
hmmsearch_config
msa_config.HmmsearchConfig
required
Hmmsearch configuration.
max_a3m_query_sequences
int | None
required
Maximum MSA sequences to use for HMM profile construction. None uses all sequences.
a3m
str | None
required
MSA in A3M format. Used to build HMM profile.
return
str
Hmmsearch results in A3M format.
Example:
from alphafold3.data import templates, msa_config

hmmsearch_cfg = msa_config.HmmsearchConfig(
    hmmsearch_binary_path="/usr/bin/hmmsearch",
    hmmbuild_binary_path="/usr/bin/hmmbuild",
    e_value=0.0001,
    inc_e=None,
    dom_e=None,
    incdom_e=None,
    alphabet="amino",
    filter_f1=0.02,
    filter_f2=0.001,
    filter_f3=0.0001,
    filter_max=False
)

hits_a3m = templates.run_hmmsearch_with_a3m(
    database_path="/data/pdb_seqres.txt",
    hmmsearch_config=hmmsearch_cfg,
    max_a3m_query_sequences=512,
    a3m=msa_a3m_string
)

print(f"Hmmsearch returned {len(hits_a3m.splitlines())} lines")

get_polymer_features

Extract polymer features from a template structure chain.
def get_polymer_features(
    *,
    chain: structure.Structure,
    chain_poly_type: str,
    query_sequence_length: int,
    query_to_hit_mapping: Mapping[int, int],
) -> Mapping[str, Any]
chain
structure.Structure
required
Structure object filtered to a single polymer chain.
chain_poly_type
str
required
Polymer type (PROTEIN_CHAIN, RNA_CHAIN, or DNA_CHAIN).
query_sequence_length
int
required
Length of the query sequence.
query_to_hit_mapping
Mapping[int, int]
required
0-based query index to hit index mapping.
return
Mapping[str, Any]
Dictionary with polymer features:
  • template_all_atom_positions: Atom coordinates aligned to query
  • template_all_atom_masks: Atom presence masks
  • template_sequence: Template sequence as bytes
  • template_aatype: Encoded residue types
  • template_domain_names: Template name as bytes
  • template_release_date: Release date as bytes
Raises:
  • ValueError: If structure doesn’t have a name, lacks release date, or contains multiple chains
Example:
from alphafold3 import structure
from alphafold3.data import templates
from alphafold3.constants import mmcif_names

# Load and filter structure to single chain
struc = structure.from_mmcif(
    mmcif_string=mmcif_content,
    fix_mse_residues=True,
    fix_arginines=True,
    include_water=False
)
chain_struc = struc.filter(chain_id="A")

# Extract features
features = templates.get_polymer_features(
    chain=chain_struc,
    chain_poly_type=mmcif_names.PROTEIN_CHAIN,
    query_sequence_length=100,
    query_to_hit_mapping=hit.query_to_hit_mapping
)

print(f"Atom positions shape: {features['template_all_atom_positions'].shape}")

package_template_features

Stack and package features from multiple template hits.
def package_template_features(
    *,
    hit_features: Sequence[Mapping[str, Any]],
    include_ligand_features: bool,
) -> Mapping[str, Any]
hit_features
Sequence[Mapping[str, Any]]
required
List of feature dictionaries, one per hit.
include_ligand_features
bool
required
Whether to include ligand features in output.
return
Mapping[str, Any]
Dictionary with stacked polymer features and unstacked ligand features (if included).

Template Search Workflow

Complete workflow for finding and using templates:
import datetime
from alphafold3.data import templates, msa, structure_stores, msa_config
from alphafold3.constants import mmcif_names

# 1. Get MSA for query
query_seq = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK"
msa_result = msa.get_msa(
    target_sequence=query_seq,
    run_config=msa_run_config,
    chain_poly_type=mmcif_names.PROTEIN_CHAIN
)

# 2. Convert MSA to A3M
msa_a3m = msa_result.to_a3m()

# 3. Search for templates
structure_store = structure_stores.PdbStructureStore(
    pdb_dir="/data/pdb_mmcif",
    obsolete_pdbs_path="/data/obsolete.dat"
)

templates_obj = templates.Templates.from_seq_and_a3m(
    query_sequence=query_seq,
    msa_a3m=msa_a3m,
    max_template_date=datetime.date(2021, 9, 30),
    database_path="/data/pdb_seqres.txt",
    hmmsearch_config=hmmsearch_config,
    max_a3m_query_sequences=512,
    structure_store=structure_store,
    chain_poly_type=mmcif_names.PROTEIN_CHAIN
)

print(f"Found {templates_obj.num_hits} initial hits")

# 4. Filter templates
filtered = templates_obj.filter(
    max_subsequence_ratio=0.95,
    min_align_ratio=0.1,
    min_hit_length=20,
    deduplicate_sequences=True,
    max_hits=20
)

print(f"Kept {filtered.num_hits} hits after filtering")

# 5. Featurise for model input
template_features = filtered.featurize(include_ligand_features=True)

print("Template features ready for inference")
print(f"  Shape: {template_features['template_all_atom_positions'].shape}")

Error Handling

from alphafold3.data import templates

try:
    templates_obj = templates.Templates.from_hmmsearch_a3m(
        query_sequence=query_seq,
        a3m=hmmsearch_a3m,
        max_template_date=max_date,
        structure_store=store
    )
    
    filtered = templates_obj.filter(
        max_subsequence_ratio=0.95,
        min_align_ratio=0.1,
        min_hit_length=20,
        deduplicate_sequences=True,
        max_hits=20
    )
    
    features = filtered.featurize()
    
except templates.HitDateError as e:
    print(f"Template date error: {e}")
except templates.InvalidTemplateError as e:
    print(f"Invalid template: {e}")
except ValueError as e:
    print(f"Validation error: {e}")

Build docs developers (and LLMs) love