Overview
bun_nltk provides three methods to load WordNet lexical databases with different trade-offs between size and coverage:- Mini - Compact subset with common words
- Extended - Larger vocabulary with more synsets
- Packed - Full WordNet in binary format for maximum coverage
Loading Functions
loadWordNetMini
Loads the compact WordNet mini database. Uses automatic caching for subsequent calls.path?: string- Optional custom path to WordNet mini JSON file. If omitted, uses bundled model atmodels/wordnet_mini.json
WordNet- Loaded WordNet instance
- Automatically cached on first call when no path is provided
- Subsequent calls with no path return cached instance
- Uses JSON format for storage
- Best for applications with size constraints
loadWordNetExtended
Loads the extended WordNet database with larger vocabulary coverage.path?: string- Optional custom path to WordNet extended JSON file. If omitted, uses bundled model atmodels/wordnet_extended.json
WordNet- Loaded WordNet instance
- Automatically cached on first call when no path is provided
- Subsequent calls with no path return cached instance
- Uses JSON format for storage
- Balances size and coverage for most applications
loadWordNetPacked
Loads the full WordNet database from packed binary format.path?: string- Optional custom path to packed WordNet binary file. If omitted, uses bundled model atmodels/wordnet_full.bin
WordNet- Loaded WordNet instance
- Automatically cached on first call when no path is provided
- Subsequent calls with no path return cached instance
- Uses binary format with magic header
BNWN1for validation - Binary format structure:
- 5 bytes: Magic string “BNWN1”
- 4 bytes: Payload length (little-endian uint32)
- N bytes: JSON payload
- Best for applications requiring maximum vocabulary coverage
- Throws error if magic header is invalid or file is corrupted
Binary Format Details
The packed WordNet format uses a custom binary structure:- Verifies magic header matches
BNWN1 - Checks payload length doesn’t exceed file bounds
- Throws descriptive errors for format violations
Choosing a Version
| Version | File Size | Synsets | Use Case |
|---|---|---|---|
| Mini | Smallest | Basic vocabulary | Embedded systems, mobile apps |
| Extended | Medium | Common + specialized | Most applications |
| Packed | Largest | Full WordNet | Research, comprehensive NLP |