- “running”, “runs”, “ran” → “run”
- “beautiful”, “beautifully” → “beauti”
- “connection”, “connected”, “connecting” → “connect”
How Stemming Works
When stemming is enabled, Orama applies the stemming algorithm during both indexing and searching:Stemming is disabled by default. You must explicitly enable it in your tokenizer configuration.
Enabling Stemming
English Stemming
For English, Orama includes a built-in Porter stemmer:Other Languages
For other languages, import the stemmer from@orama/stemmers:
Supported Languages
Orama provides stemmers for 30 languages:Arabic
@orama/stemmers/arabic
Armenian
@orama/stemmers/armenian
Bulgarian
@orama/stemmers/bulgarian
Czech
@orama/stemmers/czech
Danish
@orama/stemmers/danish
Dutch
@orama/stemmers/dutch
English
Built-in
Finnish
@orama/stemmers/finnish
French
@orama/stemmers/french
German
@orama/stemmers/german
Greek
@orama/stemmers/greek
Hungarian
@orama/stemmers/hungarian
Indian
@orama/stemmers/indian
Indonesian
@orama/stemmers/indonesian
Irish
@orama/stemmers/irish
Italian
@orama/stemmers/italian
Lithuanian
@orama/stemmers/lithuanian
Nepali
@orama/stemmers/nepali
Norwegian
@orama/stemmers/norwegian
Portuguese
@orama/stemmers/portuguese
Romanian
@orama/stemmers/romanian
Russian
@orama/stemmers/russian
Sanskrit
@orama/stemmers/sanskrit
Serbian
@orama/stemmers/serbian
Slovenian
@orama/stemmers/slovenian
Spanish
@orama/stemmers/spanish
Swedish
@orama/stemmers/swedish
Tamil
@orama/stemmers/tamil
Turkish
@orama/stemmers/turkish
Ukrainian
@orama/stemmers/ukrainian
Skipping Stemming for Specific Properties
You may want to disable stemming for certain fields:Use
stemmerSkipProperties for fields where exact word forms matter, such as product names, brand names, or technical identifiers.English Porter Stemmer
Orama’s built-in English stemmer implements the Porter stemming algorithm with multiple transformation steps:- Plural forms (“cats” → “cat”)
- Past tense (“walked” → “walk”)
- Continuous forms (“running” → “run”)
- Adverbs (“quickly” → “quick”)
- Adjectives (“beautiful” → “beauti”)
- Nouns (“nationalism” → “nation”)
Custom Stemmer
Implement a custom stemmer for specialized vocabularies:When to Use Stemming
Good Use Cases
Good Use Cases
- General content search: Blog posts, articles, documentation
- E-commerce descriptions: Product descriptions with natural language
- Support systems: Help articles, FAQs, knowledge bases
- Social media: Posts, comments, messages
Avoid Stemming For
Avoid Stemming For
- Technical terms: API names, function names, code identifiers
- Product codes: SKUs, model numbers, part identifiers
- Brand names: Company names, product names
- Proper nouns: Person names, place names
- Legal/medical text: Where exact terminology matters
Performance Impact
Memory Usage
Stemming typically reduces index size by 15-30% because different word forms map to the same stem:Search Speed
Stemming has minimal impact on search performance (typically less than 5% overhead) but can significantly improve recall.Indexing Speed
Each token requires stemming during insertion, adding approximately 10-20% overhead to indexing time.Stemming + Stopwords + Diacritics
The normalization pipeline combines multiple text processing steps:- Tokenization (split text)
- Lowercase conversion
- Stopwords removal
- Stemming
- Diacritics removal
- Caching
Installation
Related
Stopwords
Remove common words that don’t add meaning
Languages
See all supported languages and their features
Tokenization
Learn how text is split into tokens
Search
Use stemming to improve search results