Skip to main content

Overview

The Scanner performs lexical analysis (tokenization) of TypeScript source code. It breaks the source text into a stream of tokens that can be used by the parser or for direct analysis.

Creating a Scanner

ts.createScanner()

Creates a new Scanner instance.
languageVersion
ScriptTarget
required
The ECMAScript target version (ES5, ES2015, ES2020, etc.)
skipTrivia
boolean
required
Whether to skip whitespace and comments
languageVariant
LanguageVariant
Standard or JSX variant (defaults to Standard)
textInitial
string
Initial text to scan
onError
ErrorCallback
Error callback function
start
number
Starting position in the text
length
number
Length of text to scan
return
Scanner
A Scanner instance

Example

import * as ts from 'typescript';

const sourceCode = `
function add(a: number, b: number): number {
  return a + b;
}
`;

const scanner = ts.createScanner(
  ts.ScriptTarget.ES2020,
  false, // skipTrivia
  ts.LanguageVariant.Standard,
  sourceCode
);

Scanner Interface Methods

Scanning Methods

scan()

Scans the next token from the input.
return
SyntaxKind
The syntax kind of the scanned token
let token: ts.SyntaxKind;
while ((token = scanner.scan()) !== ts.SyntaxKind.EndOfFileToken) {
  console.log(`Token: ${ts.SyntaxKind[token]}`);
  console.log(`Text: ${scanner.getTokenText()}`);
  console.log(`Position: ${scanner.getTokenStart()} - ${scanner.getTokenEnd()}`);
}

getText()

Returns the full text being scanned.
return
string
The full source text
const text = scanner.getText();
console.log(`Scanning: ${text}`);

setText()

Sets new text for the scanner to scan.
text
string | undefined
required
The text to scan
start
number
Starting position (defaults to 0)
length
number
Length to scan (defaults to entire text)
scanner.setText('const x = 42;');
let token = scanner.scan();
// First token will be 'const'

Token Information Methods

getToken()

Returns the current token’s syntax kind.
return
SyntaxKind
The current token’s syntax kind
scanner.scan();
const currentToken = scanner.getToken();
console.log(`Current token: ${ts.SyntaxKind[currentToken]}`);

getTokenText()

Returns the text of the current token.
return
string
The token’s text
scanner.scan();
const text = scanner.getTokenText();
console.log(`Token text: ${text}`);

getTokenValue()

Returns the processed value of the current token (for strings and numbers).
return
string
The token’s processed value
// For a string literal like "hello\nworld"
scanner.scan();
const text = scanner.getTokenText(); // "hello\nworld" (with quotes)
const value = scanner.getTokenValue(); // hello
world (without quotes, with actual newline)

getTokenStart()

Returns the starting position of the current token (excluding leading trivia).
return
number
The token’s start position
scanner.scan();
const start = scanner.getTokenStart();
console.log(`Token starts at position: ${start}`);

getTokenEnd()

Returns the ending position of the current token.
return
number
The token’s end position
scanner.scan();
const start = scanner.getTokenStart();
const end = scanner.getTokenEnd();
console.log(`Token span: ${start} - ${end}`);

getTokenFullStart()

Returns the starting position of the current token (including leading trivia).
return
number
The token’s full start position
scanner.scan();
const fullStart = scanner.getTokenFullStart();
const start = scanner.getTokenStart();
const leadingTriviaLength = start - fullStart;
console.log(`Leading trivia length: ${leadingTriviaLength}`);

Token State Methods

isIdentifier()

Returns true if the current token is an identifier.
return
boolean
True if the token is an identifier
scanner.scan();
if (scanner.isIdentifier()) {
  console.log(`Found identifier: ${scanner.getTokenText()}`);
}

isReservedWord()

Returns true if the current token is a reserved keyword.
return
boolean
True if the token is a reserved word
scanner.scan();
if (scanner.isReservedWord()) {
  console.log(`Found keyword: ${scanner.getTokenText()}`);
}

isUnterminated()

Returns true if the current token is unterminated (e.g., unterminated string).
return
boolean
True if the token is unterminated
scanner.setText('"unterminated string');
scanner.scan();
if (scanner.isUnterminated()) {
  console.log('Warning: Unterminated string literal');
}

hasPrecedingLineBreak()

Returns true if there’s a line break before the current token.
return
boolean
True if there’s a preceding line break
const code = `const x = 1;
const y = 2;`;
scanner.setText(code);

scanner.scan(); // 'const'
scanner.scan(); // 'x'
scanner.scan(); // '='
scanner.scan(); // '1'
scanner.scan(); // ';'
scanner.scan(); // 'const'

if (scanner.hasPrecedingLineBreak()) {
  console.log('This token is on a new line');
}

hasUnicodeEscape()

Returns true if the current identifier contains a Unicode escape sequence.
return
boolean
True if the token has Unicode escapes

hasExtendedUnicodeEscape()

Returns true if the current identifier contains an extended Unicode escape sequence.
return
boolean
True if the token has extended Unicode escapes

Advanced Scanning Methods

reScanGreaterToken()

Re-scans a greater-than token in JSX or type contexts.
return
SyntaxKind
The re-scanned token kind
// Used internally for parsing generics
if (scanner.getToken() === ts.SyntaxKind.GreaterThanToken) {
  const newToken = scanner.reScanGreaterToken();
  // Might become GreaterThanGreaterThanToken (>>)
}

reScanSlashToken()

Re-scans a slash token (could be division or regex).
return
SyntaxKind
The re-scanned token kind (SlashToken or RegularExpressionLiteral)
// Context-dependent scanning
if (scanner.getToken() === ts.SyntaxKind.SlashToken) {
  const newToken = scanner.reScanSlashToken();
  if (newToken === ts.SyntaxKind.RegularExpressionLiteral) {
    console.log('This is a regex literal');
  }
}

reScanTemplateToken()

Re-scans template literal tokens.
isTaggedTemplate
boolean
required
Whether this is a tagged template literal
return
SyntaxKind
The re-scanned template token kind

scanJsxIdentifier()

Scans a JSX identifier (allows hyphens).
return
SyntaxKind
The JSX identifier token

scanJsxToken()

Scans the next token in JSX mode.
return
JsxTokenSyntaxKind
The JSX token kind

scanJsxAttributeValue()

Scans a JSX attribute value.
return
SyntaxKind
The attribute value token

State Management Methods

resetTokenState()

Resets the scanner to a specific position.
pos
number
required
The position to reset to
const pos = scanner.getTokenEnd();
// ... scan more tokens
// Reset to previous position
scanner.resetTokenState(pos);

lookAhead()

Invokes a callback while saving/restoring scanner state.
callback
() => T
required
Function to call with lookahead
return
T
The result of the callback
const nextTokenIsColon = scanner.lookAhead(() => {
  scanner.scan(); // Look at next token
  return scanner.getToken() === ts.SyntaxKind.ColonToken;
});
// Scanner state is restored after lookAhead

tryScan()

Tries a scan operation, only committing if the callback returns truthy.
callback
() => T
required
Function to try
return
T
The result of the callback
const result = scanner.tryScan(() => {
  scanner.scan();
  if (scanner.getToken() === ts.SyntaxKind.FunctionKeyword) {
    return true; // Commit the scan
  }
  return false; // Rollback the scan
});

scanRange()

Scans a specific range of text.
start
number
required
Start position
length
number
required
Length to scan
callback
() => T
required
Function to call while scanning the range
return
T
The result of the callback

Configuration Methods

setScriptTarget()

Sets the ECMAScript target version.
scriptTarget
ScriptTarget
required
The target version
scanner.setScriptTarget(ts.ScriptTarget.ES2020);

setLanguageVariant()

Sets the language variant (Standard or JSX).
variant
LanguageVariant
required
The language variant
scanner.setLanguageVariant(ts.LanguageVariant.JSX);

setScriptKind()

Sets the script kind (TS, JS, JSX, etc.).
scriptKind
ScriptKind
required
The script kind
scanner.setScriptKind(ts.ScriptKind.TSX);

setOnError()

Sets the error callback function.
onError
ErrorCallback | undefined
required
The error callback
scanner.setOnError((message, length) => {
  console.error(`Scanner error: ${message}`);
});

Complete Example

import * as ts from 'typescript';

function tokenizeSourceCode(sourceCode: string) {
  const scanner = ts.createScanner(
    ts.ScriptTarget.Latest,
    false, // don't skip trivia
    ts.LanguageVariant.Standard,
    sourceCode
  );
  
  const tokens: Array<{
    kind: string;
    text: string;
    start: number;
    end: number;
    hasLineBreak: boolean;
  }> = [];
  
  let token: ts.SyntaxKind;
  while ((token = scanner.scan()) !== ts.SyntaxKind.EndOfFileToken) {
    tokens.push({
      kind: ts.SyntaxKind[token],
      text: scanner.getTokenText(),
      start: scanner.getTokenStart(),
      end: scanner.getTokenEnd(),
      hasLineBreak: scanner.hasPrecedingLineBreak()
    });
  }
  
  return tokens;
}

// Example usage
const code = `
function greet(name: string): string {
  return "Hello, " + name;
}
`;

const tokens = tokenizeSourceCode(code);

console.log('Tokens:');
tokens.forEach((token, index) => {
  const prefix = token.hasLineBreak ? '\n' : '';
  console.log(
    `${prefix}[${index}] ${token.kind.padEnd(25)} "${token.text}" (${token.start}-${token.end})`
  );
});

// Output:
// Tokens:
// [0] FunctionKeyword          "function" (1-9)
// [1] WhitespaceTrivia          " " (9-10)
// [2] Identifier                "greet" (10-15)
// [3] OpenParenToken            "(" (15-16)
// [4] Identifier                "name" (16-20)
// [5] ColonToken                ":" (20-21)
// [6] WhitespaceTrivia          " " (21-22)
// [7] StringKeyword             "string" (22-28)
// ...

Syntax Kinds

Common token types (SyntaxKind enum):
// Keywords
ts.SyntaxKind.FunctionKeyword
ts.SyntaxKind.ConstKeyword
ts.SyntaxKind.LetKeyword
ts.SyntaxKind.VarKeyword
ts.SyntaxKind.IfKeyword
ts.SyntaxKind.ElseKeyword
ts.SyntaxKind.ReturnKeyword

// Literals
ts.SyntaxKind.NumericLiteral
ts.SyntaxKind.StringLiteral
ts.SyntaxKind.TrueKeyword
ts.SyntaxKind.FalseKeyword

// Punctuation
ts.SyntaxKind.OpenBraceToken        // {
ts.SyntaxKind.CloseBraceToken       // }
ts.SyntaxKind.OpenParenToken        // (
ts.SyntaxKind.CloseParenToken       // )
ts.SyntaxKind.OpenBracketToken      // [
ts.SyntaxKind.CloseBracketToken     // ]
ts.SyntaxKind.SemicolonToken        // ;
ts.SyntaxKind.CommaToken            // ,
ts.SyntaxKind.ColonToken            // :
ts.SyntaxKind.DotToken              // .

// Operators
ts.SyntaxKind.PlusToken             // +
ts.SyntaxKind.MinusToken            // -
ts.SyntaxKind.AsteriskToken         // *
ts.SyntaxKind.SlashToken            // /
ts.SyntaxKind.EqualsToken           // =
ts.SyntaxKind.EqualsEqualsToken     // ==
ts.SyntaxKind.EqualsEqualsEqualsToken // ===
ts.SyntaxKind.GreaterThanToken      // >
ts.SyntaxKind.LessThanToken         // <

// Special
ts.SyntaxKind.Identifier
ts.SyntaxKind.EndOfFileToken
ts.SyntaxKind.WhitespaceTrivia

See Also

Build docs developers (and LLMs) love