Perceptron Tagger

posTagPerceptronAscii

Performs part-of-speech tagging using a trained averaged perceptron model. Provides higher accuracy than rule-based tagging.

function posTagPerceptronAscii(
  text: string,
  options?: PerceptronTaggerOptions
): PerceptronTaggedToken[]

Parameters

text

string

required

The input text to tag. Must be ASCII-compatible.

options

PerceptronTaggerOptions

Optional configuration for the tagger.

Show PerceptronTaggerOptions properties

model

PerceptronTaggerModel

Pre-loaded perceptron model. If not provided, loads the default model.

wasm

WasmNltk

WebAssembly runtime instance for WASM-based inference.

useWasm

boolean

Force use of WebAssembly implementation. Requires wasm parameter.

useNative

boolean

default:true

Use native implementation. Set to false to fall back to JavaScript.

Returns

PerceptronTaggedToken[]

array

Array of tokens with their predicted POS tags.

Show PerceptronTaggedToken properties

token

string

The original token text.

tag

string

The predicted POS tag (e.g., “NN”, “VBD”, “JJ”).

tagId

number

Numeric identifier for the tag in the model’s tag set.

start

number

Character offset of the token in the original text.

length

number

Length of the token in characters.

Example

import { posTagPerceptronAscii } from 'bun_nltk';

const text = "The quick brown fox jumps over the lazy dog";
const tags = posTagPerceptronAscii(text);

console.log(tags);
// [
//   { token: "The", tag: "DT", tagId: 4, start: 0, length: 3 },
//   { token: "quick", tag: "JJ", tagId: 7, start: 4, length: 5 },
//   { token: "brown", tag: "JJ", tagId: 7, start: 10, length: 5 },
//   { token: "fox", tag: "NN", tagId: 12, start: 16, length: 3 },
//   { token: "jumps", tag: "VBZ", tagId: 38, start: 20, length: 5 },
//   ...
// ]

Custom Model

import { posTagPerceptronAscii, loadPerceptronTaggerModel } from 'bun_nltk';

// Load a custom trained model
const customModel = loadPerceptronTaggerModel('./my_model.json');

const tags = posTagPerceptronAscii("Hello world", {
  model: customModel
});

loadPerceptronTaggerModel

Loads a perceptron tagger model from a JSON file.

function loadPerceptronTaggerModel(path?: string): PerceptronTaggerModel

Parameters

path

string

Path to the model JSON file. If not provided, loads the default bundled model.

Returns

PerceptronTaggerModel

object

The loaded and prepared model ready for inference.

Show PerceptronTaggerModel properties

version

number

Model format version.

Example

import { loadPerceptronTaggerModel } from 'bun_nltk';

const model = loadPerceptronTaggerModel();
console.log(`Loaded model with ${model.featureCount} features and ${model.tagCount} tags`);

preparePerceptronTaggerModel

Prepares a serialized model for use. Useful when loading from non-standard sources.

function preparePerceptronTaggerModel(
  payload: PerceptronTaggerModelSerialized
): PerceptronTaggerModel

Parameters

payload

PerceptronTaggerModelSerialized

required

Serialized model data.

Show PerceptronTaggerModelSerialized properties

version

number

required

Model version number.

type

string

required

Model type identifier (e.g., “perceptron_tagger”).

Returns

PerceptronTaggerModel

object

Prepared model with optimized weight storage.

Example

import { preparePerceptronTaggerModel } from 'bun_nltk';

// Load from database or API
const serialized = await fetchModelFromAPI();
const model = preparePerceptronTaggerModel(serialized);

Model Features

The perceptron tagger uses the following features for each token:

bias - Bias term
w - Token text (lowercased)
p1, p2, p3 - Token prefixes (1-3 characters)
s1, s2, s3 - Token suffixes (1-3 characters)
prev - Previous token
next - Next token
is_upper - Whether token is all uppercase
is_title - Whether token is title-cased
has_digit - Whether token contains digits
has_hyphen - Whether token contains hyphens

Performance

The native implementation provides 10-100x speedup compared to pure JavaScript, especially for large texts.

Accuracy

The perceptron tagger typically achieves 95-97% accuracy on standard English text when trained on Penn Treebank data.

POS Tagging - Rule-based tagging
Frequency Distributions - Analyze tag distributions

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

Perceptron Tagger

posTagPerceptronAscii

Parameters

Returns

Example

Custom Model

loadPerceptronTaggerModel

Parameters

Returns

Example

preparePerceptronTaggerModel

Parameters

Returns

Example

Model Features

Performance

Accuracy

Build docs developers (and LLMs) love

Tokenization

Text Processing

Tagging & Analysis

Language Models

Parsing

Classification

WordNet

Corpus

WASM Runtime

Native APIs

​posTagPerceptronAscii

​Parameters

​Returns

​Example

​Custom Model

​loadPerceptronTaggerModel

​Parameters

​Returns

​Example

​preparePerceptronTaggerModel

​Parameters

​Returns

​Example

​Model Features

​Performance

​Accuracy

​Related APIs

Build docs developers (and LLMs) love

posTagPerceptronAscii

Parameters

Returns

Example

Custom Model

loadPerceptronTaggerModel

Parameters

Returns

Example

preparePerceptronTaggerModel

Parameters

Returns

Example

Model Features

Performance

Accuracy

Related APIs