Bitaboom provides over 30 formatting and typography utilities designed for cleaning up scanned manuscripts, OCR output, and general text normalization.
Punctuation normalization
Insert line breaks after punctuation
Add line breaks after sentence-ending punctuation:
import { insertLineBreaksAfterPunctuation } from 'bitaboom';
const text = 'First sentence. Second sentence! Third sentence?';
insertLineBreaksAfterPunctuation(text);
// Returns:
// "First sentence.\nSecond sentence!\nThird sentence?"
Supports: . ! ? ؟ (Arabic question mark)
For comprehensive preformatting in a single pass, use preformatArabicText which is significantly faster for large inputs.
Add spacing around punctuation
Normalize spacing around punctuation marks:
import { addSpaceBeforeAndAfterPunctuation } from 'bitaboom';
addSpaceBeforeAndAfterPunctuation('Text,word'); // returns 'Text, word'
addSpaceBeforeAndAfterPunctuation('Text , word'); // returns 'Text, word'
Handles special cases:
- Preserves spacing for quotes and ayah references
- Normalizes colons in verse references (e.g.,
12:34)
- Respects closing brackets and quotation marks
Clean spaces before punctuation
import { cleanSpacesBeforePeriod } from 'bitaboom';
cleanSpacesBeforePeriod('This is a sentence , with extra space .');
// returns 'This is a sentence, with extra space.'
Supports: . ؟ ! , ، ؛ : ?
Remove redundant punctuation
import { removeRedundantPunctuation } from 'bitaboom';
removeRedundantPunctuation('كيف حالك؟.'); // returns 'كيف حالك؟'
removeRedundantPunctuation('ممتاز!،'); // returns 'ممتاز!'
removeRedundantPunctuation('هذا جيد.'); // returns 'هذا جيد.' (unchanged)
Smart quotes and quotation marks
Apply smart quotes
Convert straight quotes to smart quotes:
import { applySmartQuotes } from 'bitaboom';
applySmartQuotes('The "quick brown" fox');
// returns 'The "quick brown" fox'
applySmartQuotes('"Start of text');
// returns '"Start of text'
Fix mismatched quotation marks
Correct various incorrect bracket/quote combinations:
import { fixMismatchedQuotationMarks } from 'bitaboom';
fixMismatchedQuotationMarks('«النص)'); // returns '«النص»'
fixMismatchedQuotationMarks('(النص»'); // returns '«النص»'
fixMismatchedQuotationMarks('«النص'); // returns '«النص»' (auto-closes)
Trim space inside quotes
import { trimSpaceInsideQuotes } from 'bitaboom';
trimSpaceInsideQuotes('" Text "'); // returns '"Text"'
trimSpaceInsideQuotes('« النص »'); // returns '«النص»'
Bracket normalization
Double to single brackets
import { doubleToSingleBrackets } from 'bitaboom';
doubleToSingleBrackets('((text))'); // returns '(text)'
doubleToSingleBrackets('[[note]]'); // returns '[note]'
Replace double brackets with Arabic guillemets
import { replaceDoubleBracketsWithArrows } from 'bitaboom';
replaceDoubleBracketsWithArrows('((text))'); // returns '«text»'
replaceDoubleBracketsWithArrows('(( spaced ))'); // returns '«spaced»'
Fix bracket typos
import { fixBracketTypos } from 'bitaboom';
fixBracketTypos('(«content»)'); // returns '«content»'
fixBracketTypos(')123)'); // returns '(123)'
fixBracketTypos(')456('); // returns '(456)'
Fix curly braces
import { fixCurlyBraces } from 'bitaboom';
fixCurlyBraces('(content}'); // returns '{content}'
fixCurlyBraces('{content)'); // returns '{content}'
Ensure spacing before brackets
import { ensureSpaceBeforeBrackets, ensureSpaceBeforeQuotes } from 'bitaboom';
ensureSpaceBeforeBrackets('text(note)'); // returns 'text (note)'
ensureSpaceBeforeQuotes('text«note»'); // returns 'text «note»'
Remove space inside brackets
import { removeSpaceInsideBrackets } from 'bitaboom';
removeSpaceInsideBrackets('( a b )'); // returns '(a b)'
removeSpaceInsideBrackets('[ text ]'); // returns '[text]'
Character repetition and condensation
Condense ellipsis
import { condenseEllipsis } from 'bitaboom';
condenseEllipsis('This is a test...'); // returns 'This is a test…'
condenseEllipsis('Wait..'); // returns 'Wait…'
Condense asterisks
import { condenseAsterisks } from 'bitaboom';
condenseAsterisks('***'); // returns '*'
condenseAsterisks('* * *'); // returns '*'
Condense colons
import { condenseColons } from 'bitaboom';
condenseColons('This.:. is a test'); // returns 'This: is a test'
condenseColons('.-:-.'); // returns ':'
Condense dashes
import { condenseDashes } from 'bitaboom';
condenseDashes('This is some ---- text'); // returns 'This is some - text'
condenseDashes('--'); // returns '-'
Condense underscores and tatweel
import { condenseUnderscores } from 'bitaboom';
condenseUnderscores('This is ـــ some text __');
// returns 'This is ـ some text _'
Condense periods
import { condensePeriods } from 'bitaboom';
condensePeriods('This . . . is a test'); // returns 'This. is a test'
Whitespace normalization
Normalize spaces
Collapse multiple spaces/tabs to single space:
import { normalizeSpaces } from 'bitaboom';
normalizeSpaces('This is a text'); // returns 'This is a text'
normalizeSpaces('Tab\t\tseparated'); // returns 'Tab separated'
Clean multilines
Remove horizontal whitespace from line edges:
import { cleanMultilines } from 'bitaboom';
cleanMultilines(' line1 \n line2 '); // returns 'line1\nline2'
cleanMultilines('\t\tindented\t\t'); // returns 'indented'
cleanMultilines('text\n \n \n'); // returns 'text\n\n\n'
This function handles various Unicode horizontal whitespace characters including regular spaces, tabs, non-breaking spaces, and other Unicode whitespace.
Reduce multiple line breaks
import { reduceMultilineBreaksToDouble } from 'bitaboom';
const text = 'This is line 1\n\n\n\nThis is line 2';
reduceMultilineBreaksToDouble(text);
// returns 'This is line 1\n\nThis is line 2'
import { reduceMultilineBreaksToSingle } from 'bitaboom';
const text = 'This is line 1\n\nThis is line 2';
reduceMultilineBreaksToSingle(text);
// returns 'This is line 1\nThis is line 2'
Clean literal newlines
Replace literal \n and \r strings with actual newlines:
import { cleanLiteralNewLines } from 'bitaboom';
cleanLiteralNewLines('A\\nB'); // returns 'A\nB'
cleanLiteralNewLines('Text\\rMore'); // returns 'Text\nMore'
Normalize slashes in references
import { normalizeSlashInReferences } from 'bitaboom';
normalizeSlashInReferences('127 / 11'); // returns '127/11'
normalizeSlashInReferences('Page 5 / 6'); // returns 'Page 5/6'
Text case utilities
Title case conversion
import { toTitleCase } from 'bitaboom';
toTitleCase('hello world'); // returns 'Hello World'
toTitleCase('SHOUTING TEXT'); // returns 'Shouting Text'
toTitleCase('mixed CASE text'); // returns 'Mixed Case Text'
Detect all uppercase
import { isAllUppercase } from 'bitaboom';
isAllUppercase('HELLO WORLD'); // returns true
isAllUppercase('Hello World'); // returns false
isAllUppercase('TITLE123'); // returns true (ignores numbers)
isAllUppercase(' '); // returns false (no letters)
Styling removal
Strip bold styling
import { stripBoldStyling } from 'bitaboom';
const boldText = '\u{1D5D4}\u{1D5D5}\u{1D5D6}'; // Mathematical bold ABC
stripBoldStyling(boldText); // returns 'ABC'
Strip italic styling
import { stripItalicsStyling } from 'bitaboom';
const italicText = '𝘼𝘽𝘾';
stripItalicsStyling(italicText); // returns 'ABC'
Strip all styling
import { stripStyling } from 'bitaboom';
// Removes both bold and italic Unicode styling
const styledText = getMixedStyledText();
stripStyling(styledText); // returns plain text
Join sentences while keeping footnotes on separate lines:
import { formatStringBySentence } from 'bitaboom';
const input = `First sentence.
(1) A footnote.
Second sentence.`;
formatStringBySentence(input);
// Returns:
// "First sentence.\n(1) A footnote.\nSecond sentence."
Footnotes are identified by:
- Arabic numerals:
(1), (2), etc.
- Eastern Arabic numerals:
(۱), (۲), etc.
Detection utilities
Detect single-word lines
Detect punctuation-only
import { hasWordInSingleLine } from 'bitaboom';
hasWordInSingleLine('word'); // true
hasWordInSingleLine('two words'); // false
hasWordInSingleLine('line1\nword\nline3'); // true
import { isOnlyPunctuation } from 'bitaboom';
isOnlyPunctuation('...'); // true
isOnlyPunctuation('123'); // true (digits count as punctuation)
isOnlyPunctuation('hello'); // false
Real-world patterns
Clean OCR manuscript output
import {
cleanSpacesBeforePeriod,
condenseEllipsis,
normalizeSpaces,
reduceMultilineBreaksToDouble,
fixMismatchedQuotationMarks,
applySmartQuotes
} from 'bitaboom';
function cleanManuscript(text: string): string {
let result = text;
// Fix punctuation spacing
result = cleanSpacesBeforePeriod(result);
// Normalize repetitions
result = condenseEllipsis(result);
result = normalizeSpaces(result);
// Fix line breaks
result = reduceMultilineBreaksToDouble(result);
// Fix quotes
result = fixMismatchedQuotationMarks(result);
result = applySmartQuotes(result);
return result;
}
import {
formatStringBySentence,
ensureSpaceBeforeBrackets,
removeSpaceInsideBrackets
} from 'bitaboom';
function normalizeFootnotes(text: string): string {
let result = text;
// Ensure proper bracketing
result = ensureSpaceBeforeBrackets(result);
result = removeSpaceInsideBrackets(result);
// Format sentences and footnotes
result = formatStringBySentence(result);
return result;
}
Create display-ready text
import {
normalizeSpaces,
cleanMultilines,
toTitleCase,
stripStyling
} from 'bitaboom';
function prepareForDisplay(rawText: string, titleCase = false): string {
let result = stripStyling(rawText);
result = normalizeSpaces(result);
result = cleanMultilines(result);
if (titleCase) {
result = toTitleCase(result);
}
return result.trim();
}
Combining with Arabic processing
import {
replaceEnglishPunctuationWithArabic,
fixTrailingWow,
cleanSpacesBeforePeriod,
normalizeSpaces,
fixMismatchedQuotationMarks
} from 'bitaboom';
function normalizeArabicText(text: string): string {
let result = text;
// Arabic-specific
result = replaceEnglishPunctuationWithArabic(result);
result = fixTrailingWow(result);
// General formatting
result = cleanSpacesBeforePeriod(result);
result = normalizeSpaces(result);
result = fixMismatchedQuotationMarks(result);
return result;
}
When processing large texts or batches, individual formatting functions can be slow. For comprehensive Arabic text cleanup, use the optimized preformatArabicText function which combines many of these operations in a single pass.
For single transformations
Use individual functions when you only need specific formatting:const result = normalizeSpaces(text);
For multiple transformations on small text
Chain functions for readability:const result = normalizeSpaces(cleanSpacesBeforePeriod(text));
For comprehensive Arabic formatting
Use the preformatting pipeline:import { preformatArabicText } from 'bitaboom';
const result = preformatArabicText(text);
All formatting functions handle edge cases gracefully, including empty strings, null values, and extreme Unicode characters.