Stemming

Auto-generate your docs

porterStemAscii
porterStemAsciiTokens
Common Use Cases

porterStemAscii

Reduces a single word to its stem using the Porter stemming algorithm.

token

string

required

Single word token to stem (ASCII text)

return

string

Stemmed form of the input token

import { porterStemAscii } from 'bun_nltk';

const stem1 = porterStemAscii("running");
// Returns: "run"

const stem2 = porterStemAscii("flies");
// Returns: "fli"

const stem3 = porterStemAscii("generalization");
// Returns: "gener"

const stem4 = porterStemAscii("connection");
// Returns: "connect"

The Porter stemmer is a rule-based algorithm that removes common morphological suffixes. It’s fast but may produce non-word stems.

porterStemAsciiTokens

Applies Porter stemming to an array of tokens.

tokens

string[]

required

Array of word tokens to stem

return

string[]

Array of stemmed tokens in the same order

import { porterStemAsciiTokens } from 'bun_nltk';

const tokens = ["running", "flies", "happily", "connected"];
const stems = porterStemAsciiTokens(tokens);
// Returns: ["run", "fli", "happili", "connect"]

Common Use Cases

Document preprocessing for search:

import { tokenizeAsciiNative, porterStemAsciiTokens } from 'bun_nltk';

const document = "The runners were running quickly through the connected pathways";
const tokens = tokenizeAsciiNative(document);
const stems = porterStemAsciiTokens(tokens);
// Process stems for indexing

Feature extraction for classification:

import { normalizeTokensAsciiNative, porterStemAsciiTokens } from 'bun_nltk';

const text = "Machine learning algorithms are learning from data";
const normalized = normalizeTokensAsciiNative(text);
const stemmed = porterStemAsciiTokens(normalized);
// Returns: ["machin", "learn", "algorithm", "learn", "data"]
// Note: "learning" appears twice with same stem "learn"

Porter stemming is aggressive and may produce stems that aren’t valid English words. For applications requiring valid words, consider using lemmatization instead.

Normalization

N-grams

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

porterStemAscii

porterStemAsciiTokens

Common Use Cases

Build docs developers (and LLMs) love

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

​porterStemAscii

​porterStemAsciiTokens

​Common Use Cases

Build docs developers (and LLMs) love

porterStemAscii

porterStemAsciiTokens

Common Use Cases