Symbol and reference removal
cleanSymbolsAndPartReferences
Removes various symbols, part references, and numerical markers from the text.The input text to apply the rule to
string - The modified text with symbols and part references removed.
cleanTrailingPageNumbers
Removes trailing page numbers formatted as ’-[46]-’ from the text.The input text with trailing page numbers
string - The modified text with page numbers removed.
removeSingleDigitReferences
Removes single digit references like (1), «2», [3] from the text.The input text containing single digit references
string - The modified text with single digit references removed.
Digit removal
stripAllDigits
Removes all numeric digits from the text.The input text containing digits
string - The modified text with digits removed.
removeNumbersAndDashes
Removes numeric digits and dashes from the text.The input text containing digits and dashes
string - The modified text with numbers and dashes removed.
removeDeathYear
Removes death year references like “(d. 390H)” and “[d. 100h]” from the text.The input text containing death year references
string - The modified text with death years removed.
URL and markdown removal
removeUrls
Removes URLs from the text.The input text containing URLs
string - The modified text with URLs removed.
removeMarkdownFormatting
Removes common Markdown formatting syntax from text.The input text containing Markdown formatting
string - Text with Markdown formatting removed (bold, italics, headers, lists, backticks).
Whitespace normalization
replaceLineBreaksWithSpaces
Replaces consecutive line breaks and whitespace characters with a single space.The input text containing line breaks or multiple spaces
string - The modified text with spaces.
Text truncation
truncate
Truncates a string to a specified length, adding an ellipsis if truncated.The string to truncate
Maximum length of the string
string - The truncated string with ellipsis if needed, otherwise the original string.
truncateMiddle
Truncates a string from the middle, preserving both the beginning and end portions.The string to truncate
Maximum length of the resulting string
Number of characters to preserve at the end (default: 1/3 of maxLength, minimum 3)
string - The truncated string with ellipsis in the middle if needed, otherwise the original string.
Path utilities
unescapeSpaces
Unescapes backslash-escaped spaces and trims whitespace from both ends. Commonly used to clean file paths that have been escaped when pasted into terminals.The string to unescape and clean
string - The cleaned string with escaped spaces converted to regular spaces and trimmed.
Diacritic-insensitive pattern matching
makeDiacriticInsensitive
Creates a diacritic-insensitive regex pattern for Arabic text matching. Normalizes text, handles character equivalences (ا/آ/أ/إ, ة/ه, ى/ي), and makes each character tolerant of Arabic diacritics (Tashkeel/Harakat).Input Arabic text to make diacritic-insensitive
string - Regex pattern string that matches the text with or without diacritics and character variants.