- Reduces index size by 20-40%
- Improves search relevance
- Speeds up query processing
- Reduces false positive matches
How Stopwords Work
During tokenization, Orama checks each token against the stopwords list and removes matches:By default, Orama initializes with an empty stopwords array. You must explicitly provide stopwords to enable filtering.
Using Built-in Stopwords
Orama provides stopword lists for 30 languages through@orama/stopwords:
Supported Languages
Orama provides stopword lists for 30 languages:Arabic
@orama/stopwords/arabic
Armenian
@orama/stopwords/armenian
Bulgarian
@orama/stopwords/bulgarian
Chinese
@orama/stopwords/chinese
Danish
@orama/stopwords/danish
Dutch
@orama/stopwords/dutch
English
@orama/stopwords/english
Finnish
@orama/stopwords/finnish
French
@orama/stopwords/french
German
@orama/stopwords/german
Greek
@orama/stopwords/greek
Hungarian
@orama/stopwords/hungarian
Indian
@orama/stopwords/indian
Indonesian
@orama/stopwords/indonesian
Irish
@orama/stopwords/irish
Italian
@orama/stopwords/italian
Japanese
@orama/stopwords/japanese
Nepali
@orama/stopwords/nepali
Norwegian
@orama/stopwords/norwegian
Portuguese
@orama/stopwords/portuguese
Romanian
@orama/stopwords/romanian
Russian
@orama/stopwords/russian
Sanskrit
@orama/stopwords/sanskrit
Serbian
@orama/stopwords/serbian
Slovenian
@orama/stopwords/slovenian
Spanish
@orama/stopwords/spanish
Swedish
@orama/stopwords/swedish
Tamil
@orama/stopwords/tamil
Turkish
@orama/stopwords/turkish
Ukrainian
@orama/stopwords/ukrainian
Custom Stopwords
Provide your own stopwords as an array:Extending Built-in Stopwords
Combine built-in stopwords with custom additions:The function receives an empty array by default since Orama doesn’t have default stopwords. This pattern allows you to chain stopword modifications.
Disabling Stopwords
Explicitly disable stopword filtering:English Stopwords List
The English stopwords package includes 204 common words:When to Use Stopwords
Good Use Cases
Good Use Cases
- Long-form content: Articles, blog posts, documentation
- General search: When searching natural language text
- Large datasets: To reduce index size significantly
- E-commerce: Product descriptions with common filler words
Avoid Stopwords For
Avoid Stopwords For
- Short content: Tweets, headlines, titles (stopwords may be significant)
- Technical content: Code, commands, where “to”, “in”, “at” may be important
- Phrase search: When exact phrases like “to be or not to be” matter
- Small datasets: Limited benefits if you have fewer than 1000 documents
Performance Impact
Index Size Reduction
Stopwords typically reduce index size by 20-40% depending on content type:Search Performance
Fewer tokens mean:- Faster searches (10-30% improvement)
- Lower memory usage
- Reduced disk I/O for persistent indexes
Insert Performance
Minimal overhead (~2-5%) for checking tokens against the stopwords list.Stopwords with Stemming
Stopwords are removed before stemming in the normalization pipeline:Validation
Orama validates stopwords configuration:Domain-Specific Example
For an e-commerce site, you might want to filter brand-specific filler words:Installation
Related
Stemming
Combine stopwords with stemming for optimal search
Tokenization
Learn how tokenization and stopwords work together
Languages
See stopwords support for all 30+ languages
Search
How stopwords affect search results