tokenize() function parses JavaScript/TypeScript code and breaks it down into an array of tokens. Each token consists of a type identifier and the text value.
Signature
Parameters
The JavaScript or TypeScript code to tokenize as a string.
Returns
An array of tokens where each token is a tuple of
[type: number, value: string].The type is a numeric identifier corresponding to one of these token types:0- identifier1- keyword2- string3- class, number, null4- property5- entity (JSX component names)6- JSX literals7- sign (operators, punctuation)8- comment9- break (line break)10- space
Example
Example with JSX
Example with Property Access
Token Types Reference
The numeric token type constants are exported from the library:Notes
The tokenizer intelligently handles JavaScript/TypeScript syntax including:
- String literals (single quotes, double quotes, template literals)
- Regular expressions
- JSX/TSX syntax
- Comments (single-line and multi-line)
- Keywords and identifiers
- Property access (e.g.,
obj.catchtreatscatchas identifier, not keyword)
JSX attribute values are always tokenized as strings, even if they look like keywords or numbers (e.g.,
<svg height="24"> treats 24 as a string).The tokenizer maintains state to correctly handle nested contexts like template literal expressions (
${...}) and JSX expressions ({...}).This is a low-level API. Most users should use the
highlight() function instead, which calls tokenize() internally.