Scanner API - TypeScript

Overview

The Scanner performs lexical analysis (tokenization) of TypeScript source code. It breaks the source text into a stream of tokens that can be used by the parser or for direct analysis.

Creating a Scanner

ts.createScanner()

Creates a new Scanner instance.

languageVersion

ScriptTarget

required

The ECMAScript target version (ES5, ES2015, ES2020, etc.)

skipTrivia

boolean

required

Whether to skip whitespace and comments

languageVariant

LanguageVariant

Standard or JSX variant (defaults to Standard)

textInitial

string

Initial text to scan

onError

ErrorCallback

Error callback function

start

number

Starting position in the text

length

number

Length of text to scan

return

Scanner

A Scanner instance

Example

import * as ts from 'typescript';

const sourceCode = `
function add(a: number, b: number): number {
  return a + b;
}
`;

const scanner = ts.createScanner(
  ts.ScriptTarget.ES2020,
  false, // skipTrivia
  ts.LanguageVariant.Standard,
  sourceCode
);

Scanner Interface Methods

Scanning Methods

scan()

Scans the next token from the input.

return

SyntaxKind

The syntax kind of the scanned token

let token: ts.SyntaxKind;
while ((token = scanner.scan()) !== ts.SyntaxKind.EndOfFileToken) {
  console.log(`Token: ${ts.SyntaxKind[token]}`);
  console.log(`Text: ${scanner.getTokenText()}`);
  console.log(`Position: ${scanner.getTokenStart()} - ${scanner.getTokenEnd()}`);
}

getText()

Returns the full text being scanned.

return

string

The full source text

const text = scanner.getText();
console.log(`Scanning: ${text}`);

setText()

Sets new text for the scanner to scan.

text

string | undefined

required

The text to scan

start

number

Starting position (defaults to 0)

length

number

Length to scan (defaults to entire text)

scanner.setText('const x = 42;');
let token = scanner.scan();
// First token will be 'const'

Token Information Methods

getToken()

Returns the current token’s syntax kind.

return

SyntaxKind

The current token’s syntax kind

scanner.scan();
const currentToken = scanner.getToken();
console.log(`Current token: ${ts.SyntaxKind[currentToken]}`);

getTokenText()

Returns the text of the current token.

return

string

The token’s text

scanner.scan();
const text = scanner.getTokenText();
console.log(`Token text: ${text}`);

getTokenValue()

Returns the processed value of the current token (for strings and numbers).

return

string

The token’s processed value

// For a string literal like "hello\nworld"
scanner.scan();
const text = scanner.getTokenText(); // "hello\nworld" (with quotes)
const value = scanner.getTokenValue(); // hello
world (without quotes, with actual newline)

getTokenStart()

Returns the starting position of the current token (excluding leading trivia).

return

number

The token’s start position

scanner.scan();
const start = scanner.getTokenStart();
console.log(`Token starts at position: ${start}`);

getTokenEnd()

Returns the ending position of the current token.

return

number

The token’s end position

scanner.scan();
const start = scanner.getTokenStart();
const end = scanner.getTokenEnd();
console.log(`Token span: ${start} - ${end}`);

getTokenFullStart()

Returns the starting position of the current token (including leading trivia).

return

number

The token’s full start position

scanner.scan();
const fullStart = scanner.getTokenFullStart();
const start = scanner.getTokenStart();
const leadingTriviaLength = start - fullStart;
console.log(`Leading trivia length: ${leadingTriviaLength}`);

Token State Methods

isIdentifier()

Returns true if the current token is an identifier.

return

boolean

True if the token is an identifier

scanner.scan();
if (scanner.isIdentifier()) {
  console.log(`Found identifier: ${scanner.getTokenText()}`);
}

isReservedWord()

Returns true if the current token is a reserved keyword.

return

boolean

True if the token is a reserved word

scanner.scan();
if (scanner.isReservedWord()) {
  console.log(`Found keyword: ${scanner.getTokenText()}`);
}

isUnterminated()

Returns true if the current token is unterminated (e.g., unterminated string).

return

boolean

True if the token is unterminated

scanner.setText('"unterminated string');
scanner.scan();
if (scanner.isUnterminated()) {
  console.log('Warning: Unterminated string literal');
}

hasPrecedingLineBreak()

Returns true if there’s a line break before the current token.

return

boolean

True if there’s a preceding line break

const code = `const x = 1;
const y = 2;`;
scanner.setText(code);

scanner.scan(); // 'const'
scanner.scan(); // 'x'
scanner.scan(); // '='
scanner.scan(); // '1'
scanner.scan(); // ';'
scanner.scan(); // 'const'

if (scanner.hasPrecedingLineBreak()) {
  console.log('This token is on a new line');
}

hasUnicodeEscape()

Returns true if the current identifier contains a Unicode escape sequence.

return

boolean

True if the token has Unicode escapes

hasExtendedUnicodeEscape()

Returns true if the current identifier contains an extended Unicode escape sequence.

return

boolean

True if the token has extended Unicode escapes

Advanced Scanning Methods

reScanGreaterToken()

Re-scans a greater-than token in JSX or type contexts.

return

SyntaxKind

The re-scanned token kind

// Used internally for parsing generics
if (scanner.getToken() === ts.SyntaxKind.GreaterThanToken) {
  const newToken = scanner.reScanGreaterToken();
  // Might become GreaterThanGreaterThanToken (>>)
}

reScanSlashToken()

Re-scans a slash token (could be division or regex).

return

SyntaxKind

The re-scanned token kind (SlashToken or RegularExpressionLiteral)

// Context-dependent scanning
if (scanner.getToken() === ts.SyntaxKind.SlashToken) {
  const newToken = scanner.reScanSlashToken();
  if (newToken === ts.SyntaxKind.RegularExpressionLiteral) {
    console.log('This is a regex literal');
  }
}

reScanTemplateToken()

Re-scans template literal tokens.

isTaggedTemplate

boolean

required

Whether this is a tagged template literal

return

SyntaxKind

The re-scanned template token kind

scanJsxIdentifier()

Scans a JSX identifier (allows hyphens).

return

SyntaxKind

The JSX identifier token

scanJsxToken()

Scans the next token in JSX mode.

return

JsxTokenSyntaxKind

The JSX token kind

scanJsxAttributeValue()

Scans a JSX attribute value.

return

SyntaxKind

The attribute value token

State Management Methods

resetTokenState()

Resets the scanner to a specific position.

pos

number

required

The position to reset to

const pos = scanner.getTokenEnd();
// ... scan more tokens
// Reset to previous position
scanner.resetTokenState(pos);

lookAhead()

Invokes a callback while saving/restoring scanner state.

callback

() => T

required

Function to call with lookahead

return

The result of the callback

const nextTokenIsColon = scanner.lookAhead(() => {
  scanner.scan(); // Look at next token
  return scanner.getToken() === ts.SyntaxKind.ColonToken;
});
// Scanner state is restored after lookAhead

tryScan()

Tries a scan operation, only committing if the callback returns truthy.

callback

() => T

required

Function to try

return

The result of the callback

const result = scanner.tryScan(() => {
  scanner.scan();
  if (scanner.getToken() === ts.SyntaxKind.FunctionKeyword) {
    return true; // Commit the scan
  }
  return false; // Rollback the scan
});

scanRange()

Scans a specific range of text.

start

number

required

Start position

length

number

required

Length to scan

callback

() => T

required

Function to call while scanning the range

return

The result of the callback

Configuration Methods

setScriptTarget()

Sets the ECMAScript target version.

scriptTarget

ScriptTarget

required

The target version

scanner.setScriptTarget(ts.ScriptTarget.ES2020);

setLanguageVariant()

Sets the language variant (Standard or JSX).

variant

LanguageVariant

required

The language variant

scanner.setLanguageVariant(ts.LanguageVariant.JSX);

setScriptKind()

Sets the script kind (TS, JS, JSX, etc.).

scriptKind

ScriptKind

required

The script kind

scanner.setScriptKind(ts.ScriptKind.TSX);

setOnError()

Sets the error callback function.

onError

ErrorCallback | undefined

required

The error callback

scanner.setOnError((message, length) => {
  console.error(`Scanner error: ${message}`);
});

Complete Example

import * as ts from 'typescript';

function tokenizeSourceCode(sourceCode: string) {
  const scanner = ts.createScanner(
    ts.ScriptTarget.Latest,
    false, // don't skip trivia
    ts.LanguageVariant.Standard,
    sourceCode
  );
  
  const tokens: Array<{
    kind: string;
    text: string;
    start: number;
    end: number;
    hasLineBreak: boolean;
  }> = [];
  
  let token: ts.SyntaxKind;
  while ((token = scanner.scan()) !== ts.SyntaxKind.EndOfFileToken) {
    tokens.push({
      kind: ts.SyntaxKind[token],
      text: scanner.getTokenText(),
      start: scanner.getTokenStart(),
      end: scanner.getTokenEnd(),
      hasLineBreak: scanner.hasPrecedingLineBreak()
    });
  }
  
  return tokens;
}

// Example usage
const code = `
function greet(name: string): string {
  return "Hello, " + name;
}
`;

const tokens = tokenizeSourceCode(code);

console.log('Tokens:');
tokens.forEach((token, index) => {
  const prefix = token.hasLineBreak ? '\n' : '';
  console.log(
    `${prefix}[${index}] ${token.kind.padEnd(25)} "${token.text}" (${token.start}-${token.end})`
  );
});

// Output:
// Tokens:
// [0] FunctionKeyword          "function" (1-9)
// [1] WhitespaceTrivia          " " (9-10)
// [2] Identifier                "greet" (10-15)
// [3] OpenParenToken            "(" (15-16)
// [4] Identifier                "name" (16-20)
// [5] ColonToken                ":" (20-21)
// [6] WhitespaceTrivia          " " (21-22)
// [7] StringKeyword             "string" (22-28)
// ...

Syntax Kinds

Common token types (SyntaxKind enum):

// Keywords
ts.SyntaxKind.FunctionKeyword
ts.SyntaxKind.ConstKeyword
ts.SyntaxKind.LetKeyword
ts.SyntaxKind.VarKeyword
ts.SyntaxKind.IfKeyword
ts.SyntaxKind.ElseKeyword
ts.SyntaxKind.ReturnKeyword

// Literals
ts.SyntaxKind.NumericLiteral
ts.SyntaxKind.StringLiteral
ts.SyntaxKind.TrueKeyword
ts.SyntaxKind.FalseKeyword

// Punctuation
ts.SyntaxKind.OpenBraceToken        // {
ts.SyntaxKind.CloseBraceToken       // }
ts.SyntaxKind.OpenParenToken        // (
ts.SyntaxKind.CloseParenToken       // )
ts.SyntaxKind.OpenBracketToken      // [
ts.SyntaxKind.CloseBracketToken     // ]
ts.SyntaxKind.SemicolonToken        // ;
ts.SyntaxKind.CommaToken            // ,
ts.SyntaxKind.ColonToken            // :
ts.SyntaxKind.DotToken              // .

// Operators
ts.SyntaxKind.PlusToken             // +
ts.SyntaxKind.MinusToken            // -
ts.SyntaxKind.AsteriskToken         // *
ts.SyntaxKind.SlashToken            // /
ts.SyntaxKind.EqualsToken           // =
ts.SyntaxKind.EqualsEqualsToken     // ==
ts.SyntaxKind.EqualsEqualsEqualsToken // ===
ts.SyntaxKind.GreaterThanToken      // >
ts.SyntaxKind.LessThanToken         // <

// Special
ts.SyntaxKind.Identifier
ts.SyntaxKind.EndOfFileToken
ts.SyntaxKind.WhitespaceTrivia

Compiler API

Language Service API

Types and Interfaces

​Overview

​Creating a Scanner

​ts.createScanner()

​Example

​Scanner Interface Methods

​Scanning Methods

​scan()

​getText()

​setText()

​Token Information Methods

​getToken()

​getTokenText()

​getTokenValue()

​getTokenStart()

​getTokenEnd()

​getTokenFullStart()

​Token State Methods

​isIdentifier()

​isReservedWord()

​isUnterminated()

​hasPrecedingLineBreak()

​hasUnicodeEscape()

​hasExtendedUnicodeEscape()

​Advanced Scanning Methods

​reScanGreaterToken()

​reScanSlashToken()

​reScanTemplateToken()

​scanJsxIdentifier()

​scanJsxToken()

​scanJsxAttributeValue()

​State Management Methods

​resetTokenState()

​lookAhead()

​tryScan()

​scanRange()

​Configuration Methods

​setScriptTarget()

​setLanguageVariant()

​setScriptKind()

​setOnError()

​Complete Example

​Syntax Kinds

​See Also

Build docs developers (and LLMs) love

Overview

Creating a Scanner

ts.createScanner()

Example

Scanner Interface Methods

Scanning Methods

scan()

getText()

setText()

Token Information Methods

getToken()

getTokenText()

getTokenValue()

getTokenStart()

getTokenEnd()

getTokenFullStart()

Token State Methods

isIdentifier()

isReservedWord()

isUnterminated()

hasPrecedingLineBreak()

hasUnicodeEscape()

hasExtendedUnicodeEscape()

Advanced Scanning Methods

reScanGreaterToken()

reScanSlashToken()

reScanTemplateToken()

scanJsxIdentifier()

scanJsxToken()

scanJsxAttributeValue()

State Management Methods

resetTokenState()

lookAhead()

tryScan()

scanRange()

Configuration Methods

setScriptTarget()

setLanguageVariant()

setScriptKind()

setOnError()

Complete Example

Syntax Kinds

See Also