Skip to main content

posTagPerceptronAscii

Performs part-of-speech tagging using a trained averaged perceptron model. Provides higher accuracy than rule-based tagging.
function posTagPerceptronAscii(
  text: string,
  options?: PerceptronTaggerOptions
): PerceptronTaggedToken[]

Parameters

text
string
required
The input text to tag. Must be ASCII-compatible.
options
PerceptronTaggerOptions
Optional configuration for the tagger.

Returns

PerceptronTaggedToken[]
array
Array of tokens with their predicted POS tags.

Example

import { posTagPerceptronAscii } from 'bun_nltk';

const text = "The quick brown fox jumps over the lazy dog";
const tags = posTagPerceptronAscii(text);

console.log(tags);
// [
//   { token: "The", tag: "DT", tagId: 4, start: 0, length: 3 },
//   { token: "quick", tag: "JJ", tagId: 7, start: 4, length: 5 },
//   { token: "brown", tag: "JJ", tagId: 7, start: 10, length: 5 },
//   { token: "fox", tag: "NN", tagId: 12, start: 16, length: 3 },
//   { token: "jumps", tag: "VBZ", tagId: 38, start: 20, length: 5 },
//   ...
// ]

Custom Model

import { posTagPerceptronAscii, loadPerceptronTaggerModel } from 'bun_nltk';

// Load a custom trained model
const customModel = loadPerceptronTaggerModel('./my_model.json');

const tags = posTagPerceptronAscii("Hello world", {
  model: customModel
});

loadPerceptronTaggerModel

Loads a perceptron tagger model from a JSON file.
function loadPerceptronTaggerModel(path?: string): PerceptronTaggerModel

Parameters

path
string
Path to the model JSON file. If not provided, loads the default bundled model.

Returns

PerceptronTaggerModel
object
The loaded and prepared model ready for inference.

Example

import { loadPerceptronTaggerModel } from 'bun_nltk';

const model = loadPerceptronTaggerModel();
console.log(`Loaded model with ${model.featureCount} features and ${model.tagCount} tags`);

preparePerceptronTaggerModel

Prepares a serialized model for use. Useful when loading from non-standard sources.
function preparePerceptronTaggerModel(
  payload: PerceptronTaggerModelSerialized
): PerceptronTaggerModel

Parameters

payload
PerceptronTaggerModelSerialized
required
Serialized model data.

Returns

PerceptronTaggerModel
object
Prepared model with optimized weight storage.

Example

import { preparePerceptronTaggerModel } from 'bun_nltk';

// Load from database or API
const serialized = await fetchModelFromAPI();
const model = preparePerceptronTaggerModel(serialized);

Model Features

The perceptron tagger uses the following features for each token:
  • bias - Bias term
  • w - Token text (lowercased)
  • p1, p2, p3 - Token prefixes (1-3 characters)
  • s1, s2, s3 - Token suffixes (1-3 characters)
  • prev - Previous token
  • next - Next token
  • is_upper - Whether token is all uppercase
  • is_title - Whether token is title-cased
  • has_digit - Whether token contains digits
  • has_hyphen - Whether token contains hyphens

Performance

The native implementation provides 10-100x speedup compared to pure JavaScript, especially for large texts.

Accuracy

The perceptron tagger typically achieves 95-97% accuracy on standard English text when trained on Penn Treebank data.

Build docs developers (and LLMs) love