Overview
The Scanner performs lexical analysis (tokenization) of TypeScript source code. It breaks the source text into a stream of tokens that can be used by the parser or for direct analysis.
Creating a Scanner
ts.createScanner()
Creates a new Scanner instance.
The ECMAScript target version (ES5, ES2015, ES2020, etc.)
Whether to skip whitespace and comments
Standard or JSX variant (defaults to Standard)
Starting position in the text
Example
import * as ts from 'typescript';
const sourceCode = `
function add(a: number, b: number): number {
return a + b;
}
`;
const scanner = ts.createScanner(
ts.ScriptTarget.ES2020,
false, // skipTrivia
ts.LanguageVariant.Standard,
sourceCode
);
Scanner Interface Methods
Scanning Methods
scan()
Scans the next token from the input.
The syntax kind of the scanned token
let token: ts.SyntaxKind;
while ((token = scanner.scan()) !== ts.SyntaxKind.EndOfFileToken) {
console.log(`Token: ${ts.SyntaxKind[token]}`);
console.log(`Text: ${scanner.getTokenText()}`);
console.log(`Position: ${scanner.getTokenStart()} - ${scanner.getTokenEnd()}`);
}
getText()
Returns the full text being scanned.
const text = scanner.getText();
console.log(`Scanning: ${text}`);
setText()
Sets new text for the scanner to scan.
text
string | undefined
required
The text to scan
Starting position (defaults to 0)
Length to scan (defaults to entire text)
scanner.setText('const x = 42;');
let token = scanner.scan();
// First token will be 'const'
getToken()
Returns the current token’s syntax kind.
The current token’s syntax kind
scanner.scan();
const currentToken = scanner.getToken();
console.log(`Current token: ${ts.SyntaxKind[currentToken]}`);
getTokenText()
Returns the text of the current token.
scanner.scan();
const text = scanner.getTokenText();
console.log(`Token text: ${text}`);
getTokenValue()
Returns the processed value of the current token (for strings and numbers).
The token’s processed value
// For a string literal like "hello\nworld"
scanner.scan();
const text = scanner.getTokenText(); // "hello\nworld" (with quotes)
const value = scanner.getTokenValue(); // hello
world (without quotes, with actual newline)
getTokenStart()
Returns the starting position of the current token (excluding leading trivia).
The token’s start position
scanner.scan();
const start = scanner.getTokenStart();
console.log(`Token starts at position: ${start}`);
getTokenEnd()
Returns the ending position of the current token.
scanner.scan();
const start = scanner.getTokenStart();
const end = scanner.getTokenEnd();
console.log(`Token span: ${start} - ${end}`);
getTokenFullStart()
Returns the starting position of the current token (including leading trivia).
The token’s full start position
scanner.scan();
const fullStart = scanner.getTokenFullStart();
const start = scanner.getTokenStart();
const leadingTriviaLength = start - fullStart;
console.log(`Leading trivia length: ${leadingTriviaLength}`);
Token State Methods
isIdentifier()
Returns true if the current token is an identifier.
True if the token is an identifier
scanner.scan();
if (scanner.isIdentifier()) {
console.log(`Found identifier: ${scanner.getTokenText()}`);
}
isReservedWord()
Returns true if the current token is a reserved keyword.
True if the token is a reserved word
scanner.scan();
if (scanner.isReservedWord()) {
console.log(`Found keyword: ${scanner.getTokenText()}`);
}
isUnterminated()
Returns true if the current token is unterminated (e.g., unterminated string).
True if the token is unterminated
scanner.setText('"unterminated string');
scanner.scan();
if (scanner.isUnterminated()) {
console.log('Warning: Unterminated string literal');
}
hasPrecedingLineBreak()
Returns true if there’s a line break before the current token.
True if there’s a preceding line break
const code = `const x = 1;
const y = 2;`;
scanner.setText(code);
scanner.scan(); // 'const'
scanner.scan(); // 'x'
scanner.scan(); // '='
scanner.scan(); // '1'
scanner.scan(); // ';'
scanner.scan(); // 'const'
if (scanner.hasPrecedingLineBreak()) {
console.log('This token is on a new line');
}
hasUnicodeEscape()
Returns true if the current identifier contains a Unicode escape sequence.
True if the token has Unicode escapes
hasExtendedUnicodeEscape()
Returns true if the current identifier contains an extended Unicode escape sequence.
True if the token has extended Unicode escapes
Advanced Scanning Methods
reScanGreaterToken()
Re-scans a greater-than token in JSX or type contexts.
The re-scanned token kind
// Used internally for parsing generics
if (scanner.getToken() === ts.SyntaxKind.GreaterThanToken) {
const newToken = scanner.reScanGreaterToken();
// Might become GreaterThanGreaterThanToken (>>)
}
reScanSlashToken()
Re-scans a slash token (could be division or regex).
The re-scanned token kind (SlashToken or RegularExpressionLiteral)
// Context-dependent scanning
if (scanner.getToken() === ts.SyntaxKind.SlashToken) {
const newToken = scanner.reScanSlashToken();
if (newToken === ts.SyntaxKind.RegularExpressionLiteral) {
console.log('This is a regex literal');
}
}
reScanTemplateToken()
Re-scans template literal tokens.
Whether this is a tagged template literal
The re-scanned template token kind
scanJsxIdentifier()
Scans a JSX identifier (allows hyphens).
scanJsxToken()
Scans the next token in JSX mode.
scanJsxAttributeValue()
Scans a JSX attribute value.
The attribute value token
State Management Methods
resetTokenState()
Resets the scanner to a specific position.
const pos = scanner.getTokenEnd();
// ... scan more tokens
// Reset to previous position
scanner.resetTokenState(pos);
lookAhead()
Invokes a callback while saving/restoring scanner state.
Function to call with lookahead
The result of the callback
const nextTokenIsColon = scanner.lookAhead(() => {
scanner.scan(); // Look at next token
return scanner.getToken() === ts.SyntaxKind.ColonToken;
});
// Scanner state is restored after lookAhead
tryScan()
Tries a scan operation, only committing if the callback returns truthy.
The result of the callback
const result = scanner.tryScan(() => {
scanner.scan();
if (scanner.getToken() === ts.SyntaxKind.FunctionKeyword) {
return true; // Commit the scan
}
return false; // Rollback the scan
});
scanRange()
Scans a specific range of text.
Function to call while scanning the range
The result of the callback
Configuration Methods
setScriptTarget()
Sets the ECMAScript target version.
scanner.setScriptTarget(ts.ScriptTarget.ES2020);
setLanguageVariant()
Sets the language variant (Standard or JSX).
scanner.setLanguageVariant(ts.LanguageVariant.JSX);
setScriptKind()
Sets the script kind (TS, JS, JSX, etc.).
scanner.setScriptKind(ts.ScriptKind.TSX);
setOnError()
Sets the error callback function.
onError
ErrorCallback | undefined
required
The error callback
scanner.setOnError((message, length) => {
console.error(`Scanner error: ${message}`);
});
Complete Example
import * as ts from 'typescript';
function tokenizeSourceCode(sourceCode: string) {
const scanner = ts.createScanner(
ts.ScriptTarget.Latest,
false, // don't skip trivia
ts.LanguageVariant.Standard,
sourceCode
);
const tokens: Array<{
kind: string;
text: string;
start: number;
end: number;
hasLineBreak: boolean;
}> = [];
let token: ts.SyntaxKind;
while ((token = scanner.scan()) !== ts.SyntaxKind.EndOfFileToken) {
tokens.push({
kind: ts.SyntaxKind[token],
text: scanner.getTokenText(),
start: scanner.getTokenStart(),
end: scanner.getTokenEnd(),
hasLineBreak: scanner.hasPrecedingLineBreak()
});
}
return tokens;
}
// Example usage
const code = `
function greet(name: string): string {
return "Hello, " + name;
}
`;
const tokens = tokenizeSourceCode(code);
console.log('Tokens:');
tokens.forEach((token, index) => {
const prefix = token.hasLineBreak ? '\n' : '';
console.log(
`${prefix}[${index}] ${token.kind.padEnd(25)} "${token.text}" (${token.start}-${token.end})`
);
});
// Output:
// Tokens:
// [0] FunctionKeyword "function" (1-9)
// [1] WhitespaceTrivia " " (9-10)
// [2] Identifier "greet" (10-15)
// [3] OpenParenToken "(" (15-16)
// [4] Identifier "name" (16-20)
// [5] ColonToken ":" (20-21)
// [6] WhitespaceTrivia " " (21-22)
// [7] StringKeyword "string" (22-28)
// ...
Syntax Kinds
Common token types (SyntaxKind enum):
// Keywords
ts.SyntaxKind.FunctionKeyword
ts.SyntaxKind.ConstKeyword
ts.SyntaxKind.LetKeyword
ts.SyntaxKind.VarKeyword
ts.SyntaxKind.IfKeyword
ts.SyntaxKind.ElseKeyword
ts.SyntaxKind.ReturnKeyword
// Literals
ts.SyntaxKind.NumericLiteral
ts.SyntaxKind.StringLiteral
ts.SyntaxKind.TrueKeyword
ts.SyntaxKind.FalseKeyword
// Punctuation
ts.SyntaxKind.OpenBraceToken // {
ts.SyntaxKind.CloseBraceToken // }
ts.SyntaxKind.OpenParenToken // (
ts.SyntaxKind.CloseParenToken // )
ts.SyntaxKind.OpenBracketToken // [
ts.SyntaxKind.CloseBracketToken // ]
ts.SyntaxKind.SemicolonToken // ;
ts.SyntaxKind.CommaToken // ,
ts.SyntaxKind.ColonToken // :
ts.SyntaxKind.DotToken // .
// Operators
ts.SyntaxKind.PlusToken // +
ts.SyntaxKind.MinusToken // -
ts.SyntaxKind.AsteriskToken // *
ts.SyntaxKind.SlashToken // /
ts.SyntaxKind.EqualsToken // =
ts.SyntaxKind.EqualsEqualsToken // ==
ts.SyntaxKind.EqualsEqualsEqualsToken // ===
ts.SyntaxKind.GreaterThanToken // >
ts.SyntaxKind.LessThanToken // <
// Special
ts.SyntaxKind.Identifier
ts.SyntaxKind.EndOfFileToken
ts.SyntaxKind.WhitespaceTrivia
See Also