Skip to main content

Symbol and reference removal

cleanSymbolsAndPartReferences

Removes various symbols, part references, and numerical markers from the text.
text
string
The input text to apply the rule to
Returns: string - The modified text with symbols and part references removed.
cleanSymbolsAndPartReferences("(1) (2/3)"); // ""

cleanTrailingPageNumbers

Removes trailing page numbers formatted as ’-[46]-’ from the text.
text
string
The input text with trailing page numbers
Returns: string - The modified text with page numbers removed.
cleanTrailingPageNumbers("This is some -[46]- text"); // "This is some text"

removeSingleDigitReferences

Removes single digit references like (1), «2», [3] from the text.
text
string
The input text containing single digit references
Returns: string - The modified text with single digit references removed.
removeSingleDigitReferences("Ref (1), Ref «2», Ref [3]"); // "Ref , Ref , Ref "

Digit removal

stripAllDigits

Removes all numeric digits from the text.
text
string
The input text containing digits
Returns: string - The modified text with digits removed.
stripAllDigits("abc123"); // "abc"

removeNumbersAndDashes

Removes numeric digits and dashes from the text.
text
string
The input text containing digits and dashes
Returns: string - The modified text with numbers and dashes removed.
removeNumbersAndDashes("ABC 123-Xyz"); // "ABC Xyz"

removeDeathYear

Removes death year references like “(d. 390H)” and “[d. 100h]” from the text.
text
string
The input text containing death year references
Returns: string - The modified text with death years removed.
removeDeathYear("Sufyān ibn 'Uyaynah (d. 198h)"); // "Sufyān ibn 'Uyaynah"

URL and markdown removal

removeUrls

Removes URLs from the text.
text
string
The input text containing URLs
Returns: string - The modified text with URLs removed.
removeUrls("Visit https://example.com"); // "Visit "

removeMarkdownFormatting

Removes common Markdown formatting syntax from text.
text
string
The input text containing Markdown formatting
Returns: string - Text with Markdown formatting removed (bold, italics, headers, lists, backticks).
removeMarkdownFormatting("**bold** and *italic*"); // "bold and italic"
removeMarkdownFormatting("# Header\n- List item"); // "Header\nList item"
removeMarkdownFormatting("[link](url)"); // "link"

Whitespace normalization

replaceLineBreaksWithSpaces

Replaces consecutive line breaks and whitespace characters with a single space.
text
string
The input text containing line breaks or multiple spaces
Returns: string - The modified text with spaces.
replaceLineBreaksWithSpaces("a\nb"); // "a b"

Text truncation

truncate

Truncates a string to a specified length, adding an ellipsis if truncated.
val
string
The string to truncate
n
number
default:"150"
Maximum length of the string
Returns: string - The truncated string with ellipsis if needed, otherwise the original string.
truncate('The quick brown fox jumps over the lazy dog', 20);
// Output: 'The quick brown fox…'

truncate('Short text', 50);
// Output: 'Short text'

truncateMiddle

Truncates a string from the middle, preserving both the beginning and end portions.
text
string
The string to truncate
maxLength
number
default:"50"
Maximum length of the resulting string
endLength
number
Number of characters to preserve at the end (default: 1/3 of maxLength, minimum 3)
Returns: string - The truncated string with ellipsis in the middle if needed, otherwise the original string.
truncateMiddle('The quick brown fox jumps right over the lazy dog', 20);
// Output: 'The quick bro…zy dog'

truncateMiddle('The quick brown fox jumps right over the lazy dog', 25, 8);
// Output: 'The quick brown …lazy dog'

truncateMiddle('Short text', 50);
// Output: 'Short text'

Path utilities

unescapeSpaces

Unescapes backslash-escaped spaces and trims whitespace from both ends. Commonly used to clean file paths that have been escaped when pasted into terminals.
input
string
The string to unescape and clean
Returns: string - The cleaned string with escaped spaces converted to regular spaces and trimmed.
unescapeSpaces('My\\ Folder\\ Name');
// Output: 'My Folder Name'

unescapeSpaces('  /path/to/My\\ Document.txt  ');
// Output: '/path/to/My Document.txt'

unescapeSpaces('regular text');
// Output: 'regular text'

Diacritic-insensitive pattern matching

makeDiacriticInsensitive

Creates a diacritic-insensitive regex pattern for Arabic text matching. Normalizes text, handles character equivalences (ا/آ/أ/إ, ة/ه, ى/ي), and makes each character tolerant of Arabic diacritics (Tashkeel/Harakat).
text
string
Input Arabic text to make diacritic-insensitive
Returns: string - Regex pattern string that matches the text with or without diacritics and character variants.
const pattern = makeDiacriticInsensitive("الكتاب");
const regex = new RegExp(pattern);
regex.test("اَلْكِتَاب"); // true (with diacritics)
regex.test("الكتاب"); // true (without diacritics)
regex.test("ألكتاب"); // true (different alif variant)

Build docs developers (and LLMs) love