Skip to main content

posTagAsciiNative

Performs rule-based part-of-speech tagging on ASCII text using native implementation.
function posTagAsciiNative(text: string): PosTag[]

Parameters

text
string
required
The input text to tag. Must be ASCII-compatible.

Returns

PosTag[]
array
Array of tagged tokens with position information.

POS Tag Types

Supported POS tags:
  • NN - Noun, singular or mass
  • NNP - Proper noun, singular
  • CD - Cardinal number
  • VBG - Verb, gerund or present participle
  • VBD - Verb, past tense
  • RB - Adverb
  • DT - Determiner
  • CC - Coordinating conjunction
  • PRP - Personal pronoun
  • VB - Verb, base form

Example

import { posTagAsciiNative } from 'bun_nltk';

const text = "The quick brown fox jumps over the lazy dog";
const tags = posTagAsciiNative(text);

console.log(tags);
// [
//   { token: "The", tag: "DT", tagId: 6, start: 0, length: 3 },
//   { token: "quick", tag: "RB", tagId: 5, start: 4, length: 5 },
//   { token: "brown", tag: "NN", tagId: 0, start: 10, length: 5 },
//   { token: "fox", tag: "NN", tagId: 0, start: 16, length: 3 },
//   { token: "jumps", tag: "VB", tagId: 9, start: 20, length: 5 },
//   ...
// ]

posTagAscii

Alias for posTagAsciiNative. Provides the same functionality with a shorter name.
function posTagAscii(text: string): PosTag[]

Performance

The native implementation uses optimized C/Rust code for high-performance tagging:
  • Fast: Processes thousands of tokens per millisecond
  • Zero-copy: Operates directly on byte arrays
  • Low memory: Minimal allocation overhead

Usage Notes

This is a rule-based tagger with limited accuracy. For higher accuracy, use posTagPerceptronAscii which employs a machine learning model.
Input must be ASCII text. Non-ASCII characters may cause unexpected results or errors.

Build docs developers (and LLMs) love