Morphological Analysis

Overview

Morphy is WordNet’s morphological analyzer that reduces inflected word forms to their base lemmas. bun_nltk provides both JavaScript and native implementations for finding dictionary base forms.

Method

morphy

Reduces an inflected or variant word form to its base lemma found in WordNet.

import { loadWordNetMini } from 'bun_nltk';

const wordnet = loadWordNetMini();

// Verb inflections
console.log(wordnet.morphy('running', 'v'));   // "run"
console.log(wordnet.morphy('ran', 'v'));       // "run"
console.log(wordnet.morphy('runs', 'v'));      // "run"

// Noun inflections
console.log(wordnet.morphy('dogs', 'n'));      // "dog"
console.log(wordnet.morphy('children', 'n'));  // "child"
console.log(wordnet.morphy('mice', 'n'));      // "mouse"

// Adjective inflections
console.log(wordnet.morphy('better', 'a'));    // "good"
console.log(wordnet.morphy('biggest', 'a'));   // "big"

// Without POS, tries all categories
console.log(wordnet.morphy('running'));        // "run"

// Returns null if no base form found
console.log(wordnet.morphy('xyzabc'));         // null

Parameters:

word: string - Inflected word form to analyze
pos?: WordNetPos - Optional part of speech: “n” (noun), “v” (verb), “a” (adjective), or “r” (adverb)

Returns:

string | null - Base lemma if found in WordNet, null otherwise

Details:

Tries native morphological algorithm first via wordnetMorphyAsciiNative()
Falls back to rule-based candidate generation
Returns first candidate found in WordNet’s lemma index
Without POS, tries all morphological rules across all parts of speech
Always validates results against actual WordNet entries

Native Function

wordnetMorphyAsciiNative

Direct access to the native morphological analyzer (lower-level API).

import { wordnetMorphyAsciiNative } from 'bun_nltk';

// Requires POS code
console.log(wordnetMorphyAsciiNative('running', 'v'));  // "run"
console.log(wordnetMorphyAsciiNative('dogs', 'n'));     // "dog"

// Without POS (tries all)
console.log(wordnetMorphyAsciiNative('running'));       // "run"

// Returns empty string if not found
console.log(wordnetMorphyAsciiNative('xyzabc'));        // ""

Parameters:

word: string - Word to analyze
pos?: "n" | "v" | "a" | "r" - Optional part of speech

Returns:

string - Base form (empty string if not found)

Details:

Implemented in native code for performance
Uses WordNet’s morphological database
Returns empty string (not null) when no base form found
Does NOT validate against WordNet entries (unlike morphy() method)
Lower-level interface; prefer WordNet.morphy() for most use cases

Morphological Rules

bun_nltk includes rule-based morphology as a fallback:

Noun Rules

function nounMorphCandidates(word: string): string[] {
  const lower = normalizeLemma(word);
  const out = [lower];
  
  // Irregular plurals
  if (lower.endsWith("ies") && lower.length > 3)
    out.push(`${lower.slice(0, -3)}y`);      // "babies" → "baby"
  
  if (lower.endsWith("ves") && lower.length > 3)
    out.push(`${lower.slice(0, -3)}f`);      // "wolves" → "wolf"
  
  // Regular plurals
  if (lower.endsWith("es") && lower.length > 2)
    out.push(lower.slice(0, -2));            // "boxes" → "box"
  
  if (lower.endsWith("s") && lower.length > 1)
    out.push(lower.slice(0, -1));            // "dogs" → "dog"
  
  return unique(out);
}

Verb Rules

function verbMorphCandidates(word: string): string[] {
  const lower = normalizeLemma(word);
  const out = [lower];
  
  // -ies ending
  if (lower.endsWith("ies") && lower.length > 3)
    out.push(`${lower.slice(0, -3)}y`);      // "tries" → "try"
  
  // -ing ending
  if (lower.endsWith("ing") && lower.length > 4) {
    out.push(lower.slice(0, -3));            // "running" → "run"
    out.push(`${lower.slice(0, -3)}e`);      // "making" → "make"
  }
  
  // -ed ending
  if (lower.endsWith("ed") && lower.length > 3) {
    out.push(lower.slice(0, -2));            // "played" → "play"
    out.push(lower.slice(0, -1));            // "stopped" → "stop" (doubled)
  }
  
  // -s ending
  if (lower.endsWith("s") && lower.length > 1)
    out.push(lower.slice(0, -1));            // "runs" → "run"
  
  return unique(out);
}

Adjective Rules

function adjectiveMorphCandidates(word: string): string[] {
  const lower = normalizeLemma(word);
  const out = [lower];
  
  // Comparative
  if (lower.endsWith("er") && lower.length > 2)
    out.push(lower.slice(0, -2));            // "bigger" → "big"
  
  // Superlative
  if (lower.endsWith("est") && lower.length > 3)
    out.push(lower.slice(0, -3));            // "biggest" → "big"
  
  return unique(out);
}

Adverb Rules

// Adverbs have minimal morphology
if (pos === "r") return [normalizeLemma(word)];

Algorithm Flow

The morphy() method follows this algorithm:

morphy(word: string, pos?: WordNetPos): string | null {
  // 1. Try native morphy first
  const nativeCandidate = wordnetMorphyAsciiNative(word, pos);
  if (nativeCandidate) {
    // Validate against WordNet entries
    const rows = this.lemmaIndex.get(nativeCandidate);
    if (rows && rows.length > 0 && (!pos || rows.some((row) => row.pos === pos))) {
      return nativeCandidate;
    }
  }
  
  // 2. Try rule-based candidates
  for (const candidate of morphCandidates(word, pos)) {
    const rows = this.lemmaIndex.get(candidate);
    if (!rows || rows.length === 0) continue;
    if (!pos || rows.some((row) => row.pos === pos)) return candidate;
  }
  
  // 3. Not found
  return null;
}

Usage Examples

Normalizing Text

import { loadWordNetMini } from 'bun_nltk';

const wordnet = loadWordNetMini();

function lemmatizeText(text: string, pos?: 'n' | 'v' | 'a' | 'r'): string[] {
  const words = text.toLowerCase().split(/\s+/);
  return words.map(word => wordnet.morphy(word, pos) || word);
}

const text = "The children were running quickly";
console.log(lemmatizeText(text));
// ["the", "child", "be", "run", "quick"]

const verbsOnly = "running played swimming";
console.log(lemmatizeText(verbsOnly, 'v'));
// ["run", "play", "swim"]

Handling Irregular Forms

import { loadWordNetMini } from 'bun_nltk';

const wordnet = loadWordNetMini();

const irregulars = [
  'children',  // → child
  'mice',      // → mouse
  'geese',     // → goose
  'feet',      // → foot
  'teeth',     // → tooth
  'men',       // → man
  'women',     // → woman
];

irregulars.forEach(word => {
  const base = wordnet.morphy(word, 'n');
  console.log(`${word} → ${base}`);
});

Comparing Lemmatization Approaches

import { loadWordNetMini, wordnetMorphyAsciiNative } from 'bun_nltk';

const wordnet = loadWordNetMini();

function compareMorphy(word: string, pos?: 'n' | 'v' | 'a' | 'r') {
  const nativeResult = wordnetMorphyAsciiNative(word, pos);
  const morphyResult = wordnet.morphy(word, pos);
  
  console.log(`Word: ${word}`);
  console.log(`  Native: ${nativeResult || '(empty)'}`);
  console.log(`  Morphy: ${morphyResult || '(null)'}`);
  console.log(`  Match: ${nativeResult === morphyResult}`);
}

compareMorphy('running', 'v');
compareMorphy('children', 'n');
compareMorphy('better', 'a');

Validating Lemmatization

import { loadWordNetMini } from 'bun_nltk';

const wordnet = loadWordNetMini();

function isValidLemma(word: string, pos?: 'n' | 'v' | 'a' | 'r'): boolean {
  // Check if word exists as-is in WordNet
  const synsets = wordnet.synsets(word, pos);
  if (synsets.length > 0) return true;
  
  // Check if morphy finds a base form
  const base = wordnet.morphy(word, pos);
  return base !== null;
}

console.log(isValidLemma('running', 'v'));     // true (→ run)
console.log(isValidLemma('xyzabc'));           // false (not in WordNet)
console.log(isValidLemma('dog', 'n'));         // true (base form)

Building Lemma Vocabulary

import { loadWordNetMini } from 'bun_nltk';

const wordnet = loadWordNetMini();

function buildLemmaMap(words: string[], pos?: 'n' | 'v' | 'a' | 'r'): Map<string, string> {
  const map = new Map<string, string>();
  
  for (const word of words) {
    const lemma = wordnet.morphy(word, pos);
    if (lemma) {
      map.set(word, lemma);
    }
  }
  
  return map;
}

const words = ['running', 'ran', 'runs', 'dogs', 'children', 'better'];
const lemmaMap = buildLemmaMap(words);

lemmaMap.forEach((lemma, word) => {
  console.log(`${word} → ${lemma}`);
});
// running → run
// ran → run
// runs → run
// dogs → dog
// children → child
// better → good

Performance Considerations

Native First: Always tries native morphy first for best performance
Caching: Consider caching morphy results for repeated lookups
POS Filtering: Providing POS improves accuracy and speed
Validation: Unlike native function, morphy() validates against WordNet entries

import { loadWordNetMini } from 'bun_nltk';

const wordnet = loadWordNetMini();
const morphyCache = new Map<string, string | null>();

function cachedMorphy(word: string, pos?: 'n' | 'v' | 'a' | 'r'): string | null {
  const key = `${word}:${pos || 'all'}`;
  
  if (morphyCache.has(key)) {
    return morphyCache.get(key)!;
  }
  
  const result = wordnet.morphy(word, pos);
  morphyCache.set(key, result);
  return result;
}

Synsets - Query synsets (uses morphy internally)
Porter Stemmer - Alternative rule-based stemming
Loading - Load WordNet databases

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

Morphological Analysis

Overview

Method

morphy

Native Function

wordnetMorphyAsciiNative

Morphological Rules

Noun Rules

Verb Rules

Adjective Rules

Adverb Rules

Algorithm Flow

Usage Examples

Normalizing Text

Handling Irregular Forms

Comparing Lemmatization Approaches

Validating Lemmatization

Building Lemma Vocabulary

Performance Considerations

Build docs developers (and LLMs) love

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

​Overview

​Method

​morphy

​Native Function

​wordnetMorphyAsciiNative

​Morphological Rules

​Noun Rules

​Verb Rules

​Adjective Rules

​Adverb Rules

​Algorithm Flow

​Usage Examples

​Normalizing Text

​Handling Irregular Forms

​Comparing Lemmatization Approaches

​Validating Lemmatization

​Building Lemma Vocabulary

​Performance Considerations

​Related

Build docs developers (and LLMs) love

Overview

Method

morphy

Native Function

wordnetMorphyAsciiNative

Morphological Rules

Noun Rules

Verb Rules

Adjective Rules

Adverb Rules

Algorithm Flow

Usage Examples

Normalizing Text

Handling Irregular Forms

Comparing Lemmatization Approaches

Validating Lemmatization

Building Lemma Vocabulary

Performance Considerations

Related