NgramLanguageModel
TheNgramLanguageModel class implements statistical n-gram language models with support for multiple smoothing techniques.
Constructor
sentences(string[][]) - Training corpus as an array of tokenized sentencesoptions(NgramLanguageModelOptions) - Configuration options (see below)
Model Types
TheLanguageModelType determines the smoothing algorithm used for probability estimation.
MLE (Maximum Likelihood Estimation)
Lidstone Smoothing
gamma(number, default:0.1) - Smoothing parameter
Kneser-Ney Interpolated
discount(number, default:0.75) - Discount value for absolute discounting
Options
NgramLanguageModelOptions
order(number, required) - N-gram order (e.g., 2 for bigrams, 3 for trigrams)model(LanguageModelType, optional) - Model type:'mle','lidstone', or'kneser_ney_interpolated'(default:'mle')gamma(number, optional) - Lidstone smoothing parameter (default:0.1)discount(number, optional) - Kneser-Ney discount parameter (default:0.75)padLeft(boolean, optional) - Add start tokens to beginning of sentences (default:true)padRight(boolean, optional) - Add end token to end of sentences (default:true)startToken(string, optional) - Token for sentence start (default:'<s>')endToken(string, optional) - Token for sentence end (default:'</s>')
Properties
TheNgramLanguageModel instance exposes the following read-only properties: