Skip to main content

Overview

The residue_names module provides comprehensive constants for working with amino acid and nucleic acid residues in AlphaFold 3. It includes mappings, conversion functions, and standard residue type definitions. Module Path: alphafold3.constants.residue_names

Functions

letters_three_to_one

@functools.lru_cache(maxsize=64)
def letters_three_to_one(restype: str, *, default: str) -> str
Returns the single letter name if one exists, otherwise returns the default value. Results are cached for performance.
restype
str
Three-letter residue code (e.g., ‘ARG’, ‘MSE’, ‘ALA’)
default
str
required
Default value to return if the residue is not found in the mapping
Returns: Single letter code (e.g., ‘R’, ‘M’, ‘A’) or the default value Example:
from alphafold3.constants.residue_names import letters_three_to_one

# Standard amino acids
print(letters_three_to_one('ARG', default='X'))  # Returns: 'R'
print(letters_three_to_one('MSE', default='X'))  # Returns: 'M'
print(letters_three_to_one('ZZZ', default='X'))  # Returns: 'X'

Mappings

CCD_NAME_TO_ONE_LETTER

CCD_NAME_TO_ONE_LETTER: Mapping[str, str]
Comprehensive mapping from three-letter CCD (Chemical Component Dictionary) codes to single-letter codes. Contains over 1,400 entries including:
  • Standard amino acids (e.g., 'ALA': 'A', 'ARG': 'R')
  • Modified amino acids (e.g., 'MSE': 'M', 'SEP': 'S')
  • Nucleic acids (e.g., 'A': 'A', 'DA': 'A')
  • Non-standard residues and modifications
Example:
from alphafold3.constants.residue_names import CCD_NAME_TO_ONE_LETTER

print(CCD_NAME_TO_ONE_LETTER['ALA'])  # 'A'
print(CCD_NAME_TO_ONE_LETTER['MSE'])  # 'M' (selenomethionine)
print(CCD_NAME_TO_ONE_LETTER['PHE'])  # 'F'

Protein Mappings

PROTEIN_COMMON_ONE_TO_THREE: Mapping[str, str]
Maps single-letter amino acid codes to three-letter codes for the 20 standard amino acids.Example:
PROTEIN_COMMON_ONE_TO_THREE = {
    'A': 'ALA', 'R': 'ARG', 'N': 'ASN', 'D': 'ASP',
    'C': 'CYS', 'Q': 'GLN', 'E': 'GLU', 'G': 'GLY',
    'H': 'HIS', 'I': 'ILE', 'L': 'LEU', 'K': 'LYS',
    'M': 'MET', 'F': 'PHE', 'P': 'PRO', 'S': 'SER',
    'T': 'THR', 'W': 'TRP', 'Y': 'TYR', 'V': 'VAL',
}
PROTEIN_COMMON_THREE_TO_ONE: Mapping[str, str]
Inverse mapping of PROTEIN_COMMON_ONE_TO_THREE.Example:
print(PROTEIN_COMMON_THREE_TO_ONE['ALA'])  # 'A'
print(PROTEIN_COMMON_THREE_TO_ONE['TRP'])  # 'W'
PROTEIN_TYPES_ONE_LETTER_TO_INT: Mapping[str, int]
Maps single-letter amino acid codes to integers (0-19) in alphabetical order.
PROTEIN_TYPES_ONE_LETTER_WITH_UNKNOWN_TO_INT: Mapping[str, int]
Maps single-letter amino acid codes to integers (0-20), including ‘X’ for unknown.
PROTEIN_TYPES_ONE_LETTER_WITH_UNKNOWN_AND_GAP_TO_INT: Mapping[str, int]
Maps single-letter amino acid codes to integers (0-21), including ‘X’ and ’-’ (gap).

Nucleic Acid Mappings

DNA_COMMON_ONE_TO_TWO: Mapping[str, str] = {
    'A': 'DA',
    'G': 'DG',
    'C': 'DC',
    'T': 'DT',
}
Maps single-letter DNA codes to two-letter codes.
RNA_TYPES_ONE_LETTER_WITH_UNKNOWN_TO_INT: Mapping[str, int]
Maps RNA single-letter codes (A, G, C, U, N) to integers (0-4).
DNA_TYPES_ONE_LETTER_WITH_UNKNOWN_TO_INT: Mapping[str, int]
Maps DNA single-letter codes (A, G, C, T, N) to integers (0-4).

Polymer Type Mappings

POLYMER_TYPES_ORDER: Mapping[str, int]
Maps polymer residue types to their order indices (29 total: 20 amino acids + 1 unknown + 8 nucleotides).
POLYMER_TYPES_ORDER_WITH_UNKNOWN: Mapping[str, int]
Maps polymer residue types to their order indices including unknown (30 total).
POLYMER_TYPES_ORDER_WITH_UNKNOWN_AND_GAP: Mapping[str, int]
Maps polymer residue types to their order indices including unknown and gap (31 total).
POLYMER_TYPES_ORDER_WITH_ALL_UNKS_AND_GAP: Mapping[str, int]
Maps polymer residue types to their order indices including all unknown types and gap (32 total).

Protein Constants

Standard Amino Acids (Interned Strings)

ALA = sys.intern('ALA')  # Alanine
ARG = sys.intern('ARG')  # Arginine
ASN = sys.intern('ASN')  # Asparagine
ASP = sys.intern('ASP')  # Aspartic acid
CYS = sys.intern('CYS')  # Cysteine
GLN = sys.intern('GLN')  # Glutamine
GLU = sys.intern('GLU')  # Glutamic acid
GLY = sys.intern('GLY')  # Glycine
HIS = sys.intern('HIS')  # Histidine
ILE = sys.intern('ILE')  # Isoleucine
LEU = sys.intern('LEU')  # Leucine
LYS = sys.intern('LYS')  # Lysine
MET = sys.intern('MET')  # Methionine
PHE = sys.intern('PHE')  # Phenylalanine
PRO = sys.intern('PRO')  # Proline
SER = sys.intern('SER')  # Serine
THR = sys.intern('THR')  # Threonine
TRP = sys.intern('TRP')  # Tryptophan
TYR = sys.intern('TYR')  # Tyrosine
VAL = sys.intern('VAL')  # Valine

Special Amino Acids

UNK = sys.intern('UNK')  # Unknown amino acid
GAP = sys.intern('-')    # Gap character
UNL = sys.intern('UNL')  # Unknown ligand
MSE = sys.intern('MSE')  # Selenomethionine (non-standard but common in PDB)

Protein Type Tuples

PROTEIN_TYPES: tuple[str, ...] = (
    ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, 
    LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL,
)
The 20 standard protein amino acids (no unknown).
PROTEIN_TYPES_WITH_UNKNOWN: tuple[str, ...] = PROTEIN_TYPES + (UNK,)
The 20 standard protein amino acids plus UNK (21 total).
PROTEIN_TYPES_ONE_LETTER: tuple[str, ...] = (
    'A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I',
    'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V',
)
Single-letter codes in alphabetical order (standard residue ordering).
PROTEIN_TYPES_ONE_LETTER_WITH_UNKNOWN: tuple[str, ...] = 
    PROTEIN_TYPES_ONE_LETTER + ('X',)
Single-letter codes including ‘X’ for unknown.
PROTEIN_TYPES_ONE_LETTER_WITH_UNKNOWN_AND_GAP: tuple[str, ...] = 
    PROTEIN_TYPES_ONE_LETTER_WITH_UNKNOWN + (GAP,)
Single-letter codes including ‘X’ and gap ’-’.

Nucleic Acid Constants

RNA Bases

A = sys.intern('A')  # Adenine
G = sys.intern('G')  # Guanine
C = sys.intern('C')  # Cytosine
U = sys.intern('U')  # Uracil

DNA Bases

DA = sys.intern('DA')  # Deoxyadenosine
DG = sys.intern('DG')  # Deoxyguanosine
DC = sys.intern('DC')  # Deoxycytidine
DT = sys.intern('DT')  # Deoxythymidine
T = sys.intern('T')    # Thymine (single letter)

Unknown Nucleic Acids

UNK_NUCLEIC_ONE_LETTER = sys.intern('N')   # Unknown nucleic acid
UNK_RNA = sys.intern('N')                  # Unknown RNA
UNK_DNA = sys.intern('DN')                 # Unknown DNA

Nucleic Acid Type Tuples

RNA_TYPES: tuple[str, ...] = (A, G, C, U)
The 4 standard RNA bases.
DNA_TYPES: tuple[str, ...] = (DA, DG, DC, DT)
The 4 standard DNA bases (two-letter codes).
DNA_TYPES_ONE_LETTER: tuple[str, ...] = (A, G, C, T)
The 4 standard DNA bases (single-letter codes).
NUCLEIC_TYPES: tuple[str, ...] = RNA_TYPES + DNA_TYPES
All 8 standard nucleic acid types (4 RNA + 4 DNA).
NUCLEIC_TYPES_WITH_UNKNOWN: tuple[str, ...] = 
    NUCLEIC_TYPES + (UNK_NUCLEIC_ONE_LETTER,)
All nucleic acid types plus one unknown type (9 total).
NUCLEIC_TYPES_WITH_2_UNKS: tuple[str, ...] = 
    NUCLEIC_TYPES + (UNK_RNA, UNK_DNA,)
All nucleic acid types plus separate unknowns for RNA and DNA (10 total).
RNA_TYPES_ONE_LETTER_WITH_UNKNOWN: tuple[str, ...] = RNA_TYPES + (UNK_RNA,)
RNA bases plus unknown (5 total).
DNA_TYPES_WITH_UNKNOWN: tuple[str, ...] = DNA_TYPES + (UNK_DNA,)
DNA bases (two-letter) plus unknown (5 total).
DNA_TYPES_ONE_LETTER_WITH_UNKNOWN: tuple[str, ...] = 
    DNA_TYPES_ONE_LETTER + (UNK_NUCLEIC_ONE_LETTER,)
DNA bases (single-letter) plus unknown (5 total).

Polymer Type Constants

Polymer Type Tuples

STANDARD_POLYMER_TYPES: tuple[str, ...] = PROTEIN_TYPES + NUCLEIC_TYPES
All standard polymer types: 20 amino acids + 8 nucleotides = 28 total.
POLYMER_TYPES: tuple[str, ...] = 
    PROTEIN_TYPES_WITH_UNKNOWN + NUCLEIC_TYPES
Polymer types including protein unknown: 21 amino acids + 8 nucleotides = 29 total.
POLYMER_TYPES_WITH_UNKNOWN: tuple[str, ...] = 
    PROTEIN_TYPES_WITH_UNKNOWN + NUCLEIC_TYPES_WITH_UNKNOWN
Polymer types with unknowns: 21 amino acids + 9 nucleotides = 30 total.
POLYMER_TYPES_WITH_GAP: tuple[str, ...] = 
    PROTEIN_TYPES + (GAP,) + NUCLEIC_TYPES
Polymer types with gap: 20 amino acids + 1 gap + 8 nucleotides = 29 total.
POLYMER_TYPES_WITH_UNKNOWN_AND_GAP: tuple[str, ...] = 
    PROTEIN_TYPES_WITH_UNKNOWN + (GAP,) + NUCLEIC_TYPES_WITH_UNKNOWN
Polymer types with unknown and gap: 21 amino acids + 1 gap + 9 nucleotides = 31 total.
POLYMER_TYPES_WITH_ALL_UNKS_AND_GAP: tuple[str, ...] = 
    PROTEIN_TYPES_WITH_UNKNOWN + (GAP,) + NUCLEIC_TYPES_WITH_2_UNKS
Polymer types with all unknowns and gap: 21 amino acids + 1 gap + 10 nucleotides = 32 total.

Polymer Type Counts

POLYMER_TYPES_NUM = 29                              # len(POLYMER_TYPES)
POLYMER_TYPES_NUM_WITH_UNKNOWN = 30                 # len(POLYMER_TYPES_WITH_UNKNOWN)
POLYMER_TYPES_NUM_WITH_GAP = 29                     # len(POLYMER_TYPES_WITH_GAP)
POLYMER_TYPES_NUM_WITH_UNKNOWN_AND_GAP = 31         # len(POLYMER_TYPES_WITH_UNKNOWN_AND_GAP)
POLYMER_TYPES_NUM_ORDER_WITH_ALL_UNKS_AND_GAP = 32  # len(POLYMER_TYPES_WITH_ALL_UNKS_AND_GAP)

Other Constants

Water Types

WATER_TYPES: tuple[str, ...] = ('HOH', 'DOD')
Standard water molecule types (H₂O and D₂O).

Unknown Types

UNKNOWN_TYPES: tuple[str, ...] = (UNK, UNK_RNA, UNK_DNA, UNL)
All unknown residue types.

Usage Examples

Converting Residue Names

from alphafold3.constants import residue_names

# Three-letter to one-letter
one_letter = residue_names.letters_three_to_one('ARG', default='X')
print(one_letter)  # 'R'

# Using mapping directly
from alphafold3.constants.residue_names import PROTEIN_COMMON_THREE_TO_ONE
print(PROTEIN_COMMON_THREE_TO_ONE['PHE'])  # 'F'

# One-letter to three-letter
from alphafold3.constants.residue_names import PROTEIN_COMMON_ONE_TO_THREE
print(PROTEIN_COMMON_ONE_TO_THREE['W'])  # 'TRP'

Working with Polymer Types

from alphafold3.constants.residue_names import (
    POLYMER_TYPES_ORDER,
    PROTEIN_TYPES,
    RNA_TYPES,
)

# Check if a residue is a standard protein type
if 'ALA' in PROTEIN_TYPES:
    print("ALA is a standard amino acid")

# Get the index of a polymer type
index = POLYMER_TYPES_ORDER['ALA']
print(f"ALA has index {index}")

# Iterate over RNA types
for rna_base in RNA_TYPES:
    print(f"RNA base: {rna_base}")

Integer Encoding

from alphafold3.constants.residue_names import (
    PROTEIN_TYPES_ONE_LETTER_TO_INT,
    RNA_TYPES_ONE_LETTER_WITH_UNKNOWN_TO_INT,
)

# Encode amino acid to integer
aa_code = PROTEIN_TYPES_ONE_LETTER_TO_INT['A']  # 0
print(f"Alanine code: {aa_code}")

# Encode RNA base to integer
rna_code = RNA_TYPES_ONE_LETTER_WITH_UNKNOWN_TO_INT['G']  # 1
print(f"Guanine code: {rna_code}")

Build docs developers (and LLMs) love