Benchmark Methodology
bun_nltk uses a comprehensive benchmark suite to measure performance across NLP operations, comparing native Zig implementations against Python NLTK baselines.Test Environments
Datasets
Benchmarks use two primary synthetic datasets:- 64MB synthetic dataset (
bench/datasets/synthetic.txt) - Used for core token/n-gram operations - 8MB gate dataset - Used for specialized workloads (Punkt, language models, parsers)
Benchmark Types
Native vs Python Comparison
Direct performance comparison between bun_nltk’s Zig implementation and Python NLTK:WASM Performance Testing
Compares three runtimes:- Native Zig (via Bun FFI)
- WASM (compiled from Zig)
- Python NLTK baseline
Browser WASM Benchmarks
Tests WASM performance in actual browser environments (Chromium/Firefox):Comparison Approach
Performance Metrics
Execution Time- Median execution time in seconds
- Speedup ratio (Python time / bun_nltk time)
- Percent improvement
- Memory delta tracking
- SLA gate enforcement for memory limits
Python Parity Testing
All benchmarks include parity harnesses to ensure functional equivalence:- Tokenizers
- Punkt sentence splitting
- Language models
- Chunk parsers
- WordNet lookups
- POS taggers
- Classifiers
Performance Gates
bun_nltk enforces performance regression gates in CI:- Median execution time thresholds
- p95 latency limits
- Memory usage boundaries
- WASM binary size budget
Trend Tracking
Benchmark results are tracked over time:Workload Categories
Core Token Operations
- Token counting (ASCII with SIMD fast path)
- Unique token counting
- N-gram generation and counting
- Frequency distributions
Text Processing
- Sentence tokenization (Punkt)
- Porter stemming
- Normalization pipelines
- Stopword filtering
Linguistic Analysis
- POS tagging (Perceptron)
- Chunk parsing (Regexp/IOB)
- CFG/Chart parsing
- Earley parsing
- Dependency parsing
Advanced NLP
- Language models (MLE, Lidstone, Kneser-Ney)
- Text classification (Naive Bayes, Decision Tree, Logistic, Linear SVM)
- Collocation detection (PMI scoring)
- WordNet lookups and morphology
Benchmark Commands Reference
Next Steps
Native vs Python
See detailed benchmark results and speedup tables
WASM Performance
Explore WASM vs native performance characteristics