Skip to main content
The Bio.Data module provides collections of biological data constants, including IUPAC alphabets, codon tables, and molecular weights.

IUPACData

The Bio.Data.IUPACData module contains standard IUPAC definitions for biological alphabets and molecular weights.

Protein Alphabets

from Bio.Data import IUPACData

# Standard protein letters (20 amino acids)
protein_letters = IUPACData.protein_letters
# 'ACDEFGHIKLMNPQRSTVWY'

# Extended protein letters (including ambiguity codes)
extended_protein_letters = IUPACData.extended_protein_letters
# 'ACDEFGHIKLMNPQRSTVWYBXZJUO'
Constants:
  • protein_letters - Standard 20 amino acid one-letter codes
  • extended_protein_letters - Includes B (Asx), X (unknown), Z (Glx), J (Xle), U (Sec), O (Pyl)
  • protein_letters_1to3 - Dictionary mapping 1-letter to 3-letter codes
  • protein_letters_3to1 - Dictionary mapping 3-letter to 1-letter codes
  • protein_letters_1to3_extended - Extended version with ambiguity codes
  • protein_letters_3to1_extended - Extended version with ambiguity codes

Nucleotide Alphabets

# DNA alphabets
unambiguous_dna_letters = IUPACData.unambiguous_dna_letters  # 'GATC'
ambiguous_dna_letters = IUPACData.ambiguous_dna_letters      # 'GATCRYWSMKHBVDN'

# RNA alphabets
unambiguous_rna_letters = IUPACData.unambiguous_rna_letters  # 'GAUC'
ambiguous_rna_letters = IUPACData.ambiguous_rna_letters      # 'GAUCRYWSMKHBVDN'

# Ambiguous nucleotide values
ambiguous_dna_values = IUPACData.ambiguous_dna_values
# {'A': 'A', 'C': 'C', 'G': 'G', 'T': 'T', 'M': 'AC', 'R': 'AG', ...}

ambiguous_rna_values = IUPACData.ambiguous_rna_values
# {'A': 'A', 'C': 'C', 'G': 'G', 'U': 'U', 'M': 'AC', 'R': 'AG', ...}
Constants:
  • unambiguous_dna_letters - Four standard DNA bases
  • ambiguous_dna_letters - DNA with IUPAC ambiguity codes
  • unambiguous_rna_letters - Four standard RNA bases
  • ambiguous_rna_letters - RNA with IUPAC ambiguity codes
  • ambiguous_dna_values - Maps ambiguity codes to possible bases (DNA)
  • ambiguous_rna_values - Maps ambiguity codes to possible bases (RNA)
  • ambiguous_dna_complement - DNA complement with ambiguity codes
  • ambiguous_rna_complement - RNA complement with ambiguity codes

Molecular Weights

# Protein weights (average masses)
protein_weights = IUPACData.protein_weights
# {'A': 89.0932, 'C': 121.1582, 'D': 133.1027, ...}

# Monoisotopic protein weights
monoisotopic_protein_weights = IUPACData.monoisotopic_protein_weights
# {'A': 89.047678, 'C': 121.019749, ...}

# DNA/RNA nucleotide weights
unambiguous_dna_weights = IUPACData.unambiguous_dna_weights
unambiguous_rna_weights = IUPACData.unambiguous_rna_weights

# Atomic weights
atom_weights = IUPACData.atom_weights
# {'H': 1.00794, 'C': 12.0107, 'N': 14.0067, 'O': 15.9994, ...}
Constants:
  • protein_weights - Average molecular weights of amino acids
  • monoisotopic_protein_weights - Monoisotopic weights of amino acids
  • extended_protein_values - Maps ambiguous protein codes to possibilities
  • unambiguous_dna_weights - Weights of DNA nucleotides (monophosphate deoxy)
  • unambiguous_rna_weights - Weights of RNA nucleotides (monophosphate)
  • monoisotopic_unambiguous_dna_weights - Monoisotopic DNA weights
  • monoisotopic_unambiguous_rna_weights - Monoisotopic RNA weights
  • atom_weights - Atomic weights for elements

PDBData

The Bio.Data.PDBData module contains protein structure-specific data from the wwPDB.
from Bio.Data import PDBData

# Extended 3-to-1 letter protein code mapping (includes modified residues)
protein_letters_3to1_extended = PDBData.protein_letters_3to1_extended
# {'ALA': 'A', 'CYS': 'C', 'MSE': 'M', 'SEP': 'S', ...}

protein_letters_1to3 = PDBData.protein_letters_1to3
# {'A': 'ALA', 'C': 'CYS', 'D': 'ASP', ...}
Constants:
  • protein_letters_3to1 - Standard 3-to-1 letter mapping (uppercase)
  • protein_letters_1to3 - Standard 1-to-3 letter mapping
  • protein_letters_3to1_extended - Includes modified/non-standard amino acids

Example Usage

from Bio.Data import IUPACData, PDBData

# Calculate molecular weight of a peptide
peptide = "ACDEFGH"
weight = sum(IUPACData.protein_weights[aa] for aa in peptide)
print(f"Peptide weight: {weight:.2f} Da")

# Decode ambiguous DNA
ambiguous_codon = "ATR"  # R = A or G
possible_bases = IUPACData.ambiguous_dna_values['R']
print(f"R can be: {possible_bases}")  # 'AG'

# Convert protein codes
three_letter = "MSE"  # Modified methionine
if three_letter in PDBData.protein_letters_3to1_extended:
    one_letter = PDBData.protein_letters_3to1_extended[three_letter]
    print(f"{three_letter} -> {one_letter}")  # 'M'

See Also

Build docs developers (and LLMs) love