Overview
bun_nltk provides high-performance alternatives to Python NLTK APIs with similar interfaces. This guide helps you migrate existing NLTK code to bun_nltk.Installation
Python NLTK
bun_nltk
Basic Tokenization
Python NLTK
bun_nltk
Sentence Tokenization
Python NLTK
bun_nltk
Training Custom Punkt Models
Python NLTK
bun_nltk
Frequency Distributions
Python NLTK
bun_nltk
Streaming Frequency Distributions
bun_nltk
N-grams
Python NLTK
bun_nltk
Everygrams and Skipgrams
Python NLTK
bun_nltk
Collocations
Python NLTK
bun_nltk
Stemming
Python NLTK
bun_nltk
Text Normalization
Python NLTK
bun_nltk
POS Tagging
Python NLTK
bun_nltk
WordNet
Python NLTK
bun_nltk
Using Packed WordNet
Language Models
Python NLTK
bun_nltk
Chunk Parsing
Python NLTK
bun_nltk
CFG Parsing
Python NLTK
bun_nltk
Earley Parser
Text Classification
Python NLTK (Naive Bayes)
bun_nltk (Naive Bayes)
Decision Tree Classifier
Logistic Regression
Corpora
Python NLTK
bun_nltk
Loading External Corpora
Performance Comparison
See detailed benchmarks:- Native vs Python - Up to 840x faster
- WASM Performance - 3-7x faster than Python
API Mapping Quick Reference
| Python NLTK | bun_nltk | Speedup |
|---|---|---|
word_tokenize() | wordTokenizeSubset() | ~4x |
sent_tokenize() | sentenceTokenizePunkt() | ~10-16x |
FreqDist | tokenFreqDistIdsAscii() | ~6x |
ngrams() | ngramsAsciiNative() | ~4x |
PorterStemmer.stem() | porterStemAscii() | ~10x |
pos_tag() | posTagPerceptronAscii() | ~4x |
wordnet.synsets() | wordnet.synsets() | ~92x |
wordnet.morphy() | wordnet.morphy() | ~92x |
KneserNeyInterpolated | trainNgramLanguageModel() | ~22x |
RegexpParser | regexpChunkParse() | ~643x |
ChartParser | parseTextWithCfg() | ~38x |
EarleyChartParser | parseTextWithEarley() | ~40x |
NaiveBayesClassifier | trainNaiveBayesTextClassifier() | ~1.4x |
DecisionTreeClassifier | trainDecisionTreeTextClassifier() | ~8x |
Key Differences
ASCII Focus
bun_nltk optimizes for ASCII text with fast paths:Native vs WASM vs JS
bun_nltk provides multiple runtime options:Prebuilt Binaries
No build step required:linux-x64win32-x64- WASM (all platforms)
Next Steps
API Reference
Explore full API documentation
Benchmarks
See detailed performance comparisons
Quick Start
Get started with bun_nltk
Examples
Browse code examples