Skip to main content
Bitaboom provides powerful functions for matching Arabic text regardless of diacritics, character variants, and decorative elements like tatweel.

Basic diacritic-insensitive matching

The makeDiacriticInsensitiveRegex function creates regex patterns that match Arabic text with or without diacritics.
import { makeDiacriticInsensitiveRegex } from 'bitaboom';

const regex = makeDiacriticInsensitiveRegex('السلام عليكم');

// Matches text with diacritics
regex.test('اَلسَّلَامُ عَلَيْكُمْ'); // true

// Matches text without diacritics
regex.test('السلام عليكم'); // true

Character equivalences

The function automatically handles common Arabic character variants:

Alif variants

All forms of alif (ا, آ, أ, إ) are treated as equivalent:
const rx = makeDiacriticInsensitiveRegex('أنا إلى الآفاق');

rx.test('انا الى الافاق'); // true
rx.test('أنا إلى الآفاق'); // true
rx.test('اَنا إلى الآفاق'); // true

Ta marbuta and ha

Ta marbuta (ة) and ha (ه) are interchangeable:
const rx = makeDiacriticInsensitiveRegex('مدرسة');

rx.test('مدرسه'); // true
rx.test('مدرسة'); // true

Alif maqsurah and ya

Alif maqsurah (ى) and ya (ي) match each other:
const rx = makeDiacriticInsensitiveRegex('على');

rx.test('علي'); // true
rx.test('على'); // true

Tatweel tolerance

The function handles decorative elongation characters (tatweel/kashida):
const rx = makeDiacriticInsensitiveRegex('أبتكة');

// Matches with tatweel between letters
rx.test('أبـــتِـــكَةُ'); // true
rx.test('أبتكة'); // true

Building complex patterns

You can compose multiple patterns using the .source property:
import { makeDiacriticInsensitiveRegex, escapeRegex } from 'bitaboom';

const words = ['أنا', 'الى'];
const pieces = words.map(w => makeDiacriticInsensitiveRegex(w).source);
const rx = new RegExp(`^(?:${pieces.join('|')})` + escapeRegex(' الافاق') + '.*$', 'mu');

rx.test('انا الافاق'); // true
rx.test('إِلى الافاق'); // true
rx.test('آنا الافاق'); // true
rx.test('هو الافاق'); // false

Simple pattern generation

For simpler use cases, use makeDiacriticInsensitive to generate a regex pattern string:
import { makeDiacriticInsensitive } from 'bitaboom';

const pattern = makeDiacriticInsensitive('مرحبا');
const regex = new RegExp(pattern);

// Matches with different alif variant
regex.test('مرحبأ'); // true

// Matches with diacritics
regex.test('مَرْحَبَا'); // true

// Matches original text
regex.test('مرحبا'); // true

Advanced options

Customize matching behavior with options:
const rx = makeDiacriticInsensitiveRegex('مدرسة', {
  equivalences: {
    alif: true,              // Match all alif variants (default: true)
    taMarbutahHa: true,      // Match ة and ه (default: true)
    alifMaqsurahYa: true     // Match ى and ي (default: true)
  },
  allowTatweel: true,        // Tolerate tatweel (default: true)
  ignoreDiacritics: true,    // Ignore diacritics (default: true)
  flexWhitespace: true,      // Match flexible whitespace (default: true)
  flags: 'u'                 // RegExp flags (default: 'u')
});

Common use cases

Search in Arabic text

function searchArabic(haystack: string, needle: string): boolean {
  const rx = makeDiacriticInsensitiveRegex(needle);
  return rx.test(haystack);
}

const text = 'قَالَ رَسُولُ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ';
searchArabic(text, 'رسول الله'); // true

Find and highlight matches

function highlightMatches(text: string, search: string): string {
  const rx = makeDiacriticInsensitiveRegex(search, { flags: 'gu' });
  return text.replace(rx, match => `<mark>${match}</mark>`);
}

const result = highlightMatches(
  'الحمد لله رب العالمين',
  'الله'
);
// Result: 'الحمد <mark>لله</mark> رب العالمين'
The makeDiacriticInsensitiveRegex function has a safety limit of 5000 characters to prevent excessive pattern sizes that could impact performance.

Edge cases

Empty strings

const result = makeDiacriticInsensitive('');
// Returns: ''

Non-Arabic characters

Non-Arabic characters are escaped and have diacritic matchers applied:
const result = makeDiacriticInsensitive('hello مرحبا');
// Pattern includes both Latin and Arabic character handling

Special regex characters

Special regex characters are automatically escaped:
const result = makeDiacriticInsensitive('test.+*?');
// Special chars are escaped: 'test\\.\\+\\*\\?'

Build docs developers (and LLMs) love