Line breaks and newlines
insertLineBreaksAfterPunctuation
Adds line breaks after punctuation marks such as periods, exclamation points, and question marks.For the full preformatting pipeline in one pass (significantly faster and more memory-friendly on very large inputs), use
preformatArabicText from the preformat module.The input text containing punctuation
string - The modified text with line breaks added after punctuation.
cleanLiteralNewLines
Replaces literal new line characters (\n) and carriage returns (\r) with actual line breaks.The input text containing literal new lines
string - The modified text with actual line breaks.
cleanMultilines
Removes horizontal whitespace (spaces, tabs, non-breaking spaces) from the beginning and end of each line, while preserving line breaks (\n and \r). Handles various types of horizontal whitespace:- Regular spaces (U+0020)
- Tabs (U+0009)
- Non-breaking spaces (U+00A0)
- Other Unicode horizontal whitespace characters
The input text to apply the rule to
string - The modified text with horizontal whitespace trimmed from each line.
reduceMultilineBreaksToDouble
Reduces multiple consecutive line breaks (3 or more) to exactly 2 line breaks.The input text to apply the rule to
string - The modified text with condensed line breaks.
reduceMultilineBreaksToSingle
Reduces multiple consecutive line breaks (2 or more) to exactly 1 line break.The input text to apply the rule to
string - The modified text with condensed line breaks.
Punctuation and spacing
addSpaceBeforeAndAfterPunctuation
Adds spaces before and after punctuation, except for certain cases like quoted text or ayah references.The input text containing punctuation
string - The modified text with spaces added before and after punctuation.
cleanSpacesBeforePeriod
Cleans unnecessary spaces before punctuation marks such as periods, commas, and question marks.The input text to apply the rule to
string - The modified text with cleaned spaces before punctuation.
removeRedundantPunctuation
Removes redundant punctuation marks that follow Arabic question marks or exclamation marks. This function cleans up text by removing periods (.) or Arabic commas (،) that immediately follow Arabic question marks (؟) or exclamation marks (!), as they are considered redundant in proper Arabic punctuation.The Arabic text to clean up
string - The text with redundant punctuation removed.
normalizeSpaces
Reduces multiple spaces or tabs to a single space.The input text containing extra spaces
string - The modified text with reduced spaces.
normalizeSlashInReferences
Removes unnecessary spaces around slashes in references.The input text containing references
string - The modified text with spaces removed around slashes.
Condensing and cleaning
condenseAsterisks
Condenses multiple asterisks (*) into a single one.The input text to apply the rule to
string - The modified text with condensed asterisks.
condenseColons
Replaces occurrences of colons surrounded by periods (e.g., ’.:.’ or ’:’) with a single colon.The input text to apply the rule to
string - The modified text with condensed colons.
condenseDashes
Condenses two or more dashes (—) into a single dash (-).The input text to apply the rule to
string - The modified text with condensed dashes.
condenseEllipsis
Replaces sequences of two or more periods (e.g., ’…’) with an ellipsis character (…).The input text to apply the rule to
string - The modified text with ellipses condensed.
condensePeriods
Condenses multiple periods separated by spaces (e.g., ’…’) into a single period.The input text to apply the rule to
string - The modified text with condensed periods.
condenseUnderscores
Condenses multiple underscores (__) or Arabic Tatweel characters (ـــــ) into a single underscore or Tatweel.The input text to apply the rule to
string - The modified text with condensed underscores.
Quotes and brackets
applySmartQuotes
Turns regular double quotes surrounding a body of text into smart quotes. Also fixes incorrect starting quotes by ensuring the string starts with an opening quote if needed.The input text to apply the rule to
string - The modified text with smart quotes applied.
doubleToSingleBrackets
Replaces double parentheses or brackets with single ones.The input text to apply the rule to
string - The modified text with condensed brackets.
ensureSpaceBeforeBrackets
Ensures at most 1 space exists before any word before brackets. Adds a space if there isn’t one, or reduces multiple spaces to one.The input text to modify
string - The modified text with proper spacing before brackets.
ensureSpaceBeforeQuotes
Ensures at most 1 space exists before any word before Arabic quotation marks. Adds a space if there isn’t one, or reduces multiple spaces to one.The input text to modify
string - The modified text with proper spacing before Arabic quotes.
fixBracketTypos
Fixes common bracket and quotation mark typos in text. Corrects malformed patterns like ”(«”, ”»)”, and misplaced digits in brackets.Input text that may contain bracket typos
string - Text with corrected bracket and quotation mark combinations.
fixCurlyBraces
Fixes mismatched curly braces by converting incorrect bracket/brace combinations to proper curly braces .Input text that may contain mismatched curly braces
string - Text with corrected curly brace pairs.
fixMismatchedQuotationMarks
Fixes mismatched quotation marks in Arabic text by converting various incorrect bracket/quote combinations to proper Arabic quotation marks (« »).Input text that may contain mismatched quotation marks
string - Text with corrected Arabic quotation marks.
removeSpaceInsideBrackets
Removes spaces inside brackets, parentheses, or square brackets.The input text with spaces inside brackets
string - The modified text with spaces removed inside brackets.
replaceDoubleBracketsWithArrows
Replaces double parentheses with Arabic quotation marks (arrows).The input text to apply the rule to
string - The modified text with condensed brackets.
trimSpaceInsideQuotes
Removes unnecessary spaces inside quotes.The input text with spaces inside quotes
string - The modified text with spaces removed inside quotes.
Text detection and analysis
hasWordInSingleLine
Detects if a word is by itself in a line.The text to check
boolean - True if there exists a word in any of the lines in the text that is by itself.
isOnlyPunctuation
Checks if the input string consists of only punctuation characters.The input text to check
boolean - Returns true if the string contains only punctuation, false otherwise.
isAllUppercase
Detects if text is entirely in uppercase letters.The text to check
boolean - True if all alphabetic characters are uppercase, false otherwise.
Sentence and paragraph formatting
formatStringBySentence
Formats a multiline string by joining sentences and maintaining footnotes on their own lines. Footnotes are identified by Arabic and English numerals.The input text containing sentences and footnotes
string - The formatted text.
Styling and case conversion
stripBoldStyling
Removes bold styling from text by normalizing the string and removing stylistic characters.The input text containing bold characters
string - The modified text with bold styling removed.
stripItalicsStyling
Removes italicized characters by replacing italic Unicode characters with their normal counterparts.The input text containing italicized characters
string - The modified text with italics removed.
stripStyling
Removes all bold and italic styling from the input text.The input text to remove styling from
string - The modified text with all styling removed.
toTitleCase
Converts a string to title case (first letter of each word capitalized).The input string to convert
string - String with each word’s first letter capitalized.