Bio.Data

The Bio.Data module provides collections of biological data constants, including IUPAC alphabets, codon tables, and molecular weights.

IUPACData

The Bio.Data.IUPACData module contains standard IUPAC definitions for biological alphabets and molecular weights.

Protein Alphabets

from Bio.Data import IUPACData

# Standard protein letters (20 amino acids)
protein_letters = IUPACData.protein_letters
# 'ACDEFGHIKLMNPQRSTVWY'

# Extended protein letters (including ambiguity codes)
extended_protein_letters = IUPACData.extended_protein_letters
# 'ACDEFGHIKLMNPQRSTVWYBXZJUO'

Constants:

protein_letters - Standard 20 amino acid one-letter codes
extended_protein_letters - Includes B (Asx), X (unknown), Z (Glx), J (Xle), U (Sec), O (Pyl)
protein_letters_1to3 - Dictionary mapping 1-letter to 3-letter codes
protein_letters_3to1 - Dictionary mapping 3-letter to 1-letter codes
protein_letters_1to3_extended - Extended version with ambiguity codes
protein_letters_3to1_extended - Extended version with ambiguity codes

Nucleotide Alphabets

# DNA alphabets
unambiguous_dna_letters = IUPACData.unambiguous_dna_letters  # 'GATC'
ambiguous_dna_letters = IUPACData.ambiguous_dna_letters      # 'GATCRYWSMKHBVDN'

# RNA alphabets
unambiguous_rna_letters = IUPACData.unambiguous_rna_letters  # 'GAUC'
ambiguous_rna_letters = IUPACData.ambiguous_rna_letters      # 'GAUCRYWSMKHBVDN'

# Ambiguous nucleotide values
ambiguous_dna_values = IUPACData.ambiguous_dna_values
# {'A': 'A', 'C': 'C', 'G': 'G', 'T': 'T', 'M': 'AC', 'R': 'AG', ...}

ambiguous_rna_values = IUPACData.ambiguous_rna_values
# {'A': 'A', 'C': 'C', 'G': 'G', 'U': 'U', 'M': 'AC', 'R': 'AG', ...}

Constants:

unambiguous_dna_letters - Four standard DNA bases
ambiguous_dna_letters - DNA with IUPAC ambiguity codes
unambiguous_rna_letters - Four standard RNA bases
ambiguous_rna_letters - RNA with IUPAC ambiguity codes
ambiguous_dna_values - Maps ambiguity codes to possible bases (DNA)
ambiguous_rna_values - Maps ambiguity codes to possible bases (RNA)
ambiguous_dna_complement - DNA complement with ambiguity codes
ambiguous_rna_complement - RNA complement with ambiguity codes

Molecular Weights

# Protein weights (average masses)
protein_weights = IUPACData.protein_weights
# {'A': 89.0932, 'C': 121.1582, 'D': 133.1027, ...}

# Monoisotopic protein weights
monoisotopic_protein_weights = IUPACData.monoisotopic_protein_weights
# {'A': 89.047678, 'C': 121.019749, ...}

# DNA/RNA nucleotide weights
unambiguous_dna_weights = IUPACData.unambiguous_dna_weights
unambiguous_rna_weights = IUPACData.unambiguous_rna_weights

# Atomic weights
atom_weights = IUPACData.atom_weights
# {'H': 1.00794, 'C': 12.0107, 'N': 14.0067, 'O': 15.9994, ...}

Constants:

protein_weights - Average molecular weights of amino acids
monoisotopic_protein_weights - Monoisotopic weights of amino acids
extended_protein_values - Maps ambiguous protein codes to possibilities
unambiguous_dna_weights - Weights of DNA nucleotides (monophosphate deoxy)
unambiguous_rna_weights - Weights of RNA nucleotides (monophosphate)
monoisotopic_unambiguous_dna_weights - Monoisotopic DNA weights
monoisotopic_unambiguous_rna_weights - Monoisotopic RNA weights
atom_weights - Atomic weights for elements

PDBData

The Bio.Data.PDBData module contains protein structure-specific data from the wwPDB.

from Bio.Data import PDBData

# Extended 3-to-1 letter protein code mapping (includes modified residues)
protein_letters_3to1_extended = PDBData.protein_letters_3to1_extended
# {'ALA': 'A', 'CYS': 'C', 'MSE': 'M', 'SEP': 'S', ...}

protein_letters_1to3 = PDBData.protein_letters_1to3
# {'A': 'ALA', 'C': 'CYS', 'D': 'ASP', ...}

Constants:

protein_letters_3to1 - Standard 3-to-1 letter mapping (uppercase)
protein_letters_1to3 - Standard 1-to-3 letter mapping
protein_letters_3to1_extended - Includes modified/non-standard amino acids

Example Usage

from Bio.Data import IUPACData, PDBData

# Calculate molecular weight of a peptide
peptide = "ACDEFGH"
weight = sum(IUPACData.protein_weights[aa] for aa in peptide)
print(f"Peptide weight: {weight:.2f} Da")

# Decode ambiguous DNA
ambiguous_codon = "ATR"  # R = A or G
possible_bases = IUPACData.ambiguous_dna_values['R']
print(f"R can be: {possible_bases}")  # 'AG'

# Convert protein codes
three_letter = "MSE"  # Modified methionine
if three_letter in PDBData.protein_letters_3to1_extended:
    one_letter = PDBData.protein_letters_3to1_extended[three_letter]
    print(f"{three_letter} -> {one_letter}")  # 'M'

Core Modules

Alignment

Sequence Analysis

Phylogenetics

Protein Structure

External Services

Data & Utilities

IUPACData

Protein Alphabets

Nucleotide Alphabets

Molecular Weights

PDBData

Example Usage

See Also

Build docs developers (and LLMs) love

Core Modules

Alignment

Sequence Analysis

Phylogenetics

Protein Structure

External Services

Data & Utilities

​IUPACData

​Protein Alphabets

​Nucleotide Alphabets

​Molecular Weights

​PDBData

​Example Usage

​See Also

Build docs developers (and LLMs) love

IUPACData

Protein Alphabets

Nucleotide Alphabets

Molecular Weights

PDBData

Example Usage

See Also