Overview
bun_nltk compiles to WebAssembly for browser and edge runtime environments. The WASM build maintains significant performance advantages over Python while providing near-native speed.WASM vs Native vs Python
Three-way comparison on the 64MB synthetic dataset:| Runtime | Token/N-gram Operations (sec) | Speedup vs Python |
|---|---|---|
| Zig Native (via Bun FFI) | 1.719 | 7.70x |
| Zig WASM | 4.150 | 3.19x |
| Python NLTK | 13.241 | 1.00x (baseline) |
Key Insights
WASM Performance: 3.19x faster than Python- Still significantly faster than Python baseline
- Overhead from WASM runtime is manageable
- Good choice for browser/edge deployments
- Best performance for server-side workloads
- Direct memory access via Bun FFI
- SIMD optimizations enabled
- WASM overhead from sandboxing and linear memory
- No SIMD in WASM build (uses scalar fallback)
- Still provides excellent absolute performance
Browser WASM Benchmarks
bun_nltk includes automated browser benchmarks in CI:Test Environment
Browsers Tested:- Chromium (headless)
- Firefox (headless)
- Token counting and n-gram operations
- Punkt sentence tokenization
- Language model evaluation
- Chunk parsing (IOB)
- WordNet morphology
Browser Performance
Browser WASM benchmarks run in CI with strict mode enforcement. Each workload has per-browser thresholds to catch performance regressions. Memory Management:- WASM memory pool reuse via
WasmNltkwrapper - Reduced allocation overhead for repeated operations
- Explicit disposal for memory cleanup
WASM API Usage
Initialization
Token Operations
Text Processing
WordNet Morphology
Advanced Operations
Cleanup
WASM Binary Size
The WASM build is optimized for browser delivery:ReleaseSmalloptimization mode- Stripped debug symbols
- Minimal runtime overhead
Browser Performance Tips
1. Reuse WASM Instance
2. Batch Operations
3. Lazy Initialization
4. Preload WASM Module
WASM vs Native Trade-offs
When to Use WASM
Browser/Edge Runtimes- Client-side text processing
- Edge computing (Cloudflare Workers, Deno Deploy)
- Offline-capable web applications
- Platform-agnostic deployment
- No native binary dependencies
- Consistent behavior across environments
- Sandboxed execution environment
- Memory safety guarantees
- Limited system access
When to Use Native
Server-Side Workloads- Maximum throughput required
- Bun/Node.js backend services
- Batch processing pipelines
- Large text corpora
- Token-heavy operations
- High-frequency operations
- Lower memory overhead
- Direct memory management
- Better cache utilization
WASM Feature Parity
The following operations have WASM equivalents:| Feature | Native API | WASM API |
|---|---|---|
| Token counting | countTokensAscii | wasm.countTokensAscii |
| N-gram counting | countNgramsAscii | wasm.countNgramsAscii |
| Tokenization | tokenizeAsciiNative | wasm.tokenizeAscii |
| Normalization | normalizeTokensAsciiNative | wasm.normalizeTokensAscii |
| Punkt sentence split | sentenceTokenizePunktAsciiNative | wasm.sentenceTokenizePunktAscii |
| WordNet morphy | wordnetMorphyAsciiNative | wasm.wordnetMorphyAscii |
| Perceptron inference | perceptronPredictBatchNative | wasm.perceptronPredictBatch |
| LM evaluation | evaluateLanguageModelIdsNative | wasm.evaluateLanguageModelIds |
| Chunk IOB parsing | chunkIobIdsNative | wasm.chunkIobIds |
Performance Regression Testing
Browser WASM benchmarks run in CI for every PR:- Per-workload performance thresholds
- Cross-browser consistency checks
- WASM size budget enforcement
Next Steps
Native Benchmarks
See detailed native vs Python comparison
API Reference
Explore WASM API documentation