Intl.Segmenter for word segmentation and canvas measureText for shaping, which means it inherits the browser’s font engine as ground truth. This gives it broad script support without a separate Unicode shaping library.
The supported CSS target is:
white-space: normal(orpre-wrap— see Whitespace modes)word-break: normaloverflow-wrap: break-wordline-break: auto
CJK (Chinese, Japanese, Korean)
CJK (Chinese, Japanese, Korean)
CJK text allows a line break between almost any two adjacent characters. Pretext splits CJK words from
Intl.Segmenter into individual grapheme units and measures each one, matching browser per-character breaking behavior.Kinsoku rules are applied during segmentation: certain punctuation characters that are prohibited at line starts or ends are merged with their adjacent grapheme to prevent orphaned punctuation. This includes common Japanese and Chinese punctuation such as closing brackets, commas, and periods that must stay attached to the preceding character.Arabic and bidirectional text
Arabic and bidirectional text
On the simple Arabic punctuation clusters and mark sequences are handled in preprocessing so that punctuation stays attached to adjacent word units as the browser expects.
prepare() + layout() path, Pretext handles Arabic and mixed LTR/RTL text for height and line count without paying for bidi metadata.On the rich path (prepareWithSegments()), bidi level metadata is computed per segment. This lets custom renderers reorder runs visually within each line for correct Arabic display.Thai, Khmer, Myanmar, Hindi, Urdu
Thai, Khmer, Myanmar, Hindi, Urdu
These scripts are covered by For Southeast Asian scripts, Pretext uses
Intl.Segmenter word boundaries and are included in the corpus test suite. Line breaking follows the segmenter’s word boundaries, and the canvas engine provides accurate per-segment widths for whichever font is in use.Range-based corpus diagnostics to validate line breaks against actual browser output. Thai, Khmer, and Myanmar are regularly tested.Emoji and emoji sequences
Emoji and emoji sequences
Emoji widths on macOS are inflated by Chrome and Firefox when the font size is below 24px. The canvas measures emoji wider than the DOM renders them (Apple Color Emoji). Pretext auto-detects this inflation by comparing canvas width against a single cached DOM read per font size, then applies a correction factor to every emoji grapheme.Safari canvas and DOM agree on emoji width (both are wider than the font size), so no correction is applied there.Emoji ZWJ sequences (e.g. family emoji, flag sequences) are treated as single grapheme units by
Intl.Segmenter and measured as one unit.Mixed-script text
Mixed-script text
Pretext handles mixed-script text — Latin mixed with CJK, Arabic mixed with Latin, emoji sequences inside prose — as part of its normal processing.
Intl.Segmenter segments word boundaries across script transitions, and the canvas measurement step handles whichever font resolves for each segment.Locale targeting
Locale targeting
By default, Pretext uses the runtime locale for
Intl.Segmenter. If your app serves a specific locale or you want deterministic segmentation across environments, call setLocale() before preparing text:setLocale() also clears the internal segment cache, so segments prepared under the previous locale are not mixed with segments prepared under the new one.