posTagPerceptronAscii
Performs part-of-speech tagging using a trained averaged perceptron model. Provides higher accuracy than rule-based tagging.
function posTagPerceptronAscii (
text : string ,
options ?: PerceptronTaggerOptions
) : PerceptronTaggedToken []
Parameters
The input text to tag. Must be ASCII-compatible.
Optional configuration for the tagger. Show PerceptronTaggerOptions properties
Pre-loaded perceptron model. If not provided, loads the default model.
WebAssembly runtime instance for WASM-based inference.
Force use of WebAssembly implementation. Requires wasm parameter.
Use native implementation. Set to false to fall back to JavaScript.
Returns
Array of tokens with their predicted POS tags. Show PerceptronTaggedToken properties
The predicted POS tag (e.g., “NN”, “VBD”, “JJ”).
Numeric identifier for the tag in the model’s tag set.
Character offset of the token in the original text.
Length of the token in characters.
Example
import { posTagPerceptronAscii } from 'bun_nltk' ;
const text = "The quick brown fox jumps over the lazy dog" ;
const tags = posTagPerceptronAscii ( text );
console . log ( tags );
// [
// { token: "The", tag: "DT", tagId: 4, start: 0, length: 3 },
// { token: "quick", tag: "JJ", tagId: 7, start: 4, length: 5 },
// { token: "brown", tag: "JJ", tagId: 7, start: 10, length: 5 },
// { token: "fox", tag: "NN", tagId: 12, start: 16, length: 3 },
// { token: "jumps", tag: "VBZ", tagId: 38, start: 20, length: 5 },
// ...
// ]
Custom Model
import { posTagPerceptronAscii , loadPerceptronTaggerModel } from 'bun_nltk' ;
// Load a custom trained model
const customModel = loadPerceptronTaggerModel ( './my_model.json' );
const tags = posTagPerceptronAscii ( "Hello world" , {
model: customModel
});
loadPerceptronTaggerModel
Loads a perceptron tagger model from a JSON file.
function loadPerceptronTaggerModel ( path ?: string ) : PerceptronTaggerModel
Parameters
Path to the model JSON file. If not provided, loads the default bundled model.
Returns
The loaded and prepared model ready for inference. Show PerceptronTaggerModel properties
Number of features in the model.
Map from feature names to feature IDs.
Model weight matrix (featureCount × tagCount).
Example
import { loadPerceptronTaggerModel } from 'bun_nltk' ;
const model = loadPerceptronTaggerModel ();
console . log ( `Loaded model with ${ model . featureCount } features and ${ model . tagCount } tags` );
preparePerceptronTaggerModel
Prepares a serialized model for use. Useful when loading from non-standard sources.
function preparePerceptronTaggerModel (
payload : PerceptronTaggerModelSerialized
) : PerceptronTaggerModel
Parameters
payload
PerceptronTaggerModelSerialized
required
Serialized model data. Show PerceptronTaggerModelSerialized properties
Model type identifier (e.g., “perceptron_tagger”).
Total number of features.
feature_index
Record<string, number>
required
Feature name to ID mapping.
Returns
Prepared model with optimized weight storage.
Example
import { preparePerceptronTaggerModel } from 'bun_nltk' ;
// Load from database or API
const serialized = await fetchModelFromAPI ();
const model = preparePerceptronTaggerModel ( serialized );
Model Features
The perceptron tagger uses the following features for each token:
bias - Bias term
w - Token text (lowercased)
p1, p2, p3 - Token prefixes (1-3 characters)
s1, s2, s3 - Token suffixes (1-3 characters)
prev - Previous token
next - Next token
is_upper - Whether token is all uppercase
is_title - Whether token is title-cased
has_digit - Whether token contains digits
has_hyphen - Whether token contains hyphens
The native implementation provides 10-100x speedup compared to pure JavaScript, especially for large texts.
Accuracy
The perceptron tagger typically achieves 95-97% accuracy on standard English text when trained on Penn Treebank data.