Frequency Distributions

NativeFreqDistStream

A streaming frequency distribution collector that efficiently tracks token, bigram, and conditional frequencies without storing all text in memory.

class NativeFreqDistStream {
  constructor()
  update(text: string): void
  flush(): void
  tokenUniqueCount(): number
  bigramUniqueCount(): number
  conditionalUniqueCount(): number
  tokenFreqDistHash(): Map<bigint, number>
  bigramFreqDistHash(): StreamBigramFreq[]
  conditionalFreqDistHash(): StreamConditionalFreq[]
  toJson(): string
  dispose(): void
}

Constructor

Creates a new frequency distribution stream.

const stream = new NativeFreqDistStream();

Methods

update

Processes text and updates frequency distributions.

update(text: string): void

text

string

required

Text to process and add to the distributions.

Example:

const stream = new NativeFreqDistStream();
stream.update("The quick brown fox");
stream.update("The lazy dog");

flush

Flushes internal buffers. Call this after the last update() before reading results.

flush(): void

Example:

stream.update("Final text");
stream.flush();
const counts = stream.tokenUniqueCount();

tokenUniqueCount

Returns the number of unique tokens observed.

tokenUniqueCount(): number

Returns: Count of unique tokens. Example:

const uniqueTokens = stream.tokenUniqueCount();
console.log(`Found ${uniqueTokens} unique tokens`);

bigramUniqueCount

Returns the number of unique bigrams (token pairs) observed.

bigramUniqueCount(): number

Returns: Count of unique bigrams.

conditionalUniqueCount

Returns the number of unique conditional frequencies (POS tag + token combinations).

conditionalUniqueCount(): number

Returns: Count of unique conditional pairs.

tokenFreqDistHash

Returns token frequencies as a map of token hashes to counts.

tokenFreqDistHash(): Map<bigint, number>

Returns: Map where keys are token hashes (bigint) and values are occurrence counts. Example:

const freqs = stream.tokenFreqDistHash();
for (const [hash, count] of freqs) {
  console.log(`Hash ${hash}: ${count} occurrences`);
}

bigramFreqDistHash

Returns bigram frequencies.

bigramFreqDistHash(): StreamBigramFreq[]

Returns: Array of bigram frequency objects.

StreamBigramFreq[]

array

Show StreamBigramFreq properties

leftHash

bigint

Hash of the first token.

rightHash

bigint

Hash of the second token.

count

number

Number of times this bigram occurred.

Example:

const bigrams = stream.bigramFreqDistHash();
for (const bg of bigrams) {
  console.log(`Bigram (${bg.leftHash}, ${bg.rightHash}): ${bg.count}`);
}

conditionalFreqDistHash

Returns conditional frequencies (POS tags with associated tokens).

conditionalFreqDistHash(): StreamConditionalFreq[]

Returns: Array of conditional frequency objects.

StreamConditionalFreq[]

array

Show StreamConditionalFreq properties

tagId

number

POS tag identifier.

tokenHash

bigint

Hash of the token.

count

number

Frequency count for this tag-token pair.

Example:

const conditional = stream.conditionalFreqDistHash();
for (const cf of conditional) {
  console.log(`Tag ${cf.tagId} + Token ${cf.tokenHash}: ${cf.count}`);
}

toJson

Serializes all distributions to JSON format.

toJson(): string

Returns: JSON string containing tokens, bigrams, and conditional tags. Example:

const json = stream.toJson();
const data = JSON.parse(json);
console.log(data.tokens);
console.log(data.bigrams);
console.log(data.conditional_tags);

dispose

Frees native resources. Call this when done with the stream to prevent memory leaks.

dispose(): void

Example:

const stream = new NativeFreqDistStream();
try {
  stream.update("Some text");
  stream.flush();
  const counts = stream.tokenFreqDistHash();
} finally {
  stream.dispose();
}

Complete Example

import { NativeFreqDistStream } from 'bun_nltk';

const stream = new NativeFreqDistStream();

try {
  // Process multiple documents
  stream.update("The quick brown fox jumps over the lazy dog");
  stream.update("The lazy dog sleeps all day");
  stream.update("The quick fox runs fast");
  
  // Flush before reading
  stream.flush();
  
  // Get statistics
  console.log(`Unique tokens: ${stream.tokenUniqueCount()}`);
  console.log(`Unique bigrams: ${stream.bigramUniqueCount()}`);
  
  // Get frequency distributions
  const tokenFreqs = stream.tokenFreqDistHash();
  console.log(`Total distinct tokens tracked: ${tokenFreqs.size}`);
  
  const bigrams = stream.bigramFreqDistHash();
  console.log(`Total bigrams: ${bigrams.length}`);
  
  // Export to JSON
  const json = stream.toJson();
  console.log("Serialized data:", json);
  
} finally {
  // Always dispose to free native memory
  stream.dispose();
}

tokenFreqDistIdsAscii

Computes token frequency distribution with ID mapping for ASCII text.

function tokenFreqDistIdsAscii(text: string): TokenFreqDistIds

Parameters

text

string

required

Input text to analyze. Must be ASCII-compatible.

Returns

TokenFreqDistIds

object

Token frequency data with ID mappings.

Show TokenFreqDistIds properties

tokens

string[]

Array of unique tokens in order.

counts

number[]

Frequency count for each token (parallel to tokens array).

tokenToId

Map<string, number>

Map from token string to its ID/index.

totalTokens

number

Total number of tokens (including duplicates).

Example

import { tokenFreqDistIdsAscii } from 'bun_nltk';

const text = "the quick brown fox jumps over the lazy dog";
const dist = tokenFreqDistIdsAscii(text);

console.log(dist.tokens);
// ["the", "quick", "brown", "fox", "jumps", "over", "lazy", "dog"]

console.log(dist.counts);
// [2, 1, 1, 1, 1, 1, 1, 1]

console.log(dist.tokenToId.get("the"));
// 0

console.log(dist.totalTokens);
// 9

Use Cases

Building vocabulary indices for ML models
Efficient token-to-ID mapping for further processing
Analyzing word frequency distributions
Preparing data for n-gram analysis

ngramFreqDistHashAscii

Computes n-gram frequency distribution using hash-based representation.

function ngramFreqDistHashAscii(text: string, n: number): Map<bigint, number>

Parameters

text

string

required

Input text to analyze.

number

required

N-gram size (e.g., 2 for bigrams, 3 for trigrams). Must be a positive integer.

Returns

Map<bigint, number>

Map

Map where keys are n-gram hashes and values are occurrence counts.

Example

import { ngramFreqDistHashAscii } from 'bun_nltk';

const text = "the quick brown fox jumps over the lazy dog";

// Bigram frequencies
const bigrams = ngramFreqDistHashAscii(text, 2);
console.log(`Found ${bigrams.size} unique bigrams`);

// Trigram frequencies
const trigrams = ngramFreqDistHashAscii(text, 3);
console.log(`Found ${trigrams.size} unique trigrams`);

for (const [hash, count] of bigrams) {
  console.log(`N-gram hash ${hash}: ${count} occurrences`);
}

Performance

Hash-based representation allows constant-time lookups and insertions, making this extremely efficient for large texts.

Hashes do not preserve the original n-gram text. Use ngramsAsciiNative() if you need the actual token sequences.

POS Tagging - Tag tokens before frequency analysis
Perceptron Tagger - ML-based tagging

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

Frequency Distributions

NativeFreqDistStream

Constructor

Methods

update

flush

tokenUniqueCount

bigramUniqueCount

conditionalUniqueCount

tokenFreqDistHash

bigramFreqDistHash

conditionalFreqDistHash

toJson

dispose

Complete Example

tokenFreqDistIdsAscii

Parameters

Returns

Example

Use Cases

ngramFreqDistHashAscii

Parameters

Returns

Example

Performance

Build docs developers (and LLMs) love

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

​NativeFreqDistStream

​Constructor

​Methods

​update

​flush

​tokenUniqueCount

​bigramUniqueCount

​conditionalUniqueCount

​tokenFreqDistHash

​bigramFreqDistHash

​conditionalFreqDistHash

​toJson

​dispose

​Complete Example

​tokenFreqDistIdsAscii

​Parameters

​Returns

​Example

​Use Cases

​ngramFreqDistHashAscii

​Parameters

​Returns

​Example

​Performance

​Related APIs

Build docs developers (and LLMs) love

NativeFreqDistStream

Constructor

Methods

update

flush

tokenUniqueCount

bigramUniqueCount

conditionalUniqueCount

tokenFreqDistHash

bigramFreqDistHash

conditionalFreqDistHash

toJson

dispose

Complete Example

tokenFreqDistIdsAscii

Parameters

Returns

Example

Use Cases

ngramFreqDistHashAscii

Parameters

Returns

Example

Performance

Related APIs