Learn how to transform raw JavaScript source code into an Abstract Syntax Tree (AST) through lexical analysis, tokenization, and parsing. This is the foundational step in building a JavaScript engine.
Overview
Parsing is the process of converting JavaScript source code into a structured representation that the engine can understand and execute. This involves multiple stages working together to analyze the syntax and build an AST.
The parser is the first major component of a JavaScript engine, sitting between the raw source code and the interpreter/compiler.
Lexical analysis and tokenization
Lexical analysis is the process of breaking down source code into meaningful tokens. Each token represents a fundamental building block of the language.
Scan characters
Read the source code character by character, identifying patterns that form tokens.
Classify tokens
Categorize each token by type: keywords, identifiers, operators, literals, punctuation.
Generate token stream
Produce a sequence of tokens that will be consumed by the parser.
Token types
A robust tokenizer must recognize and classify different token types:
Keywords
Identifiers
Operators
Literals
// Reserved words in JavaScript
function , const , let , var , if , else , while , for , return , class , extends
Proper handling of edge cases like Unicode identifiers, regex literals, and template strings is crucial for a production-ready tokenizer.
Recursive descent parser
The recursive descent parser is a top-down parsing technique where each non-terminal in the grammar is implemented as a function.
Parser structure
The parser consists of mutually recursive functions that mirror the grammar rules:
Basic structure
Parse expression
Parse statement
class Parser {
private:
Lexer & lexer;
Token currentToken;
void advance () {
currentToken = lexer . nextToken ();
}
bool match ( TokenType type ) {
if ( currentToken . type == type) {
advance ();
return true ;
}
return false ;
}
public:
ASTNode * parseProgram ();
ASTNode * parseStatement ();
ASTNode * parseExpression ();
ASTNode * parsePrimary ();
};
Operator precedence
Handling operator precedence correctly is essential for parsing expressions:
Operator precedence table
Precedence Operators Associativity 15 (), [], .Left-to-right 14 ++, --, !, typeofRight-to-left 13 **Right-to-left 12 *, /, %Left-to-right 11 +, -Left-to-right 10 <, <=, >, >=Left-to-right 9 ==, ===, !=, !==Left-to-right 5 &&Left-to-right 4 ` ` Left-to-right 3 =, +=, -=, etc.Right-to-left
AST construction and validation
The Abstract Syntax Tree is a hierarchical representation of the program structure. Each node represents a syntactic construct.
AST node types
Different AST nodes represent different language constructs:
Declarations
Function declarations
Variable declarations
Class declarations
Statements
Expression statements
If/else statements
Loop statements
Return statements
Expressions
Binary expressions
Call expressions
Member expressions
Literals
Special
Program (root node)
Block statements
Function parameters
AST node base class
Binary expression node
Function declaration node
class ASTNode {
public:
enum class Type {
Program ,
FunctionDeclaration ,
VariableDeclaration ,
BinaryExpression ,
CallExpression ,
Identifier ,
Literal
};
virtual ~ASTNode () = default ;
virtual Type getType () const = 0 ;
virtual void accept ( ASTVisitor & visitor ) = 0 ;
};
AST validation
After construction, the AST should be validated for semantic correctness:
Type checking
Verify that operations are performed on compatible types.
Reference validation
Ensure all variables and functions are declared before use.
Control flow analysis
Check for unreachable code, missing return statements, and other control flow issues.
Scope analysis and symbol resolution
Scope analysis determines which declarations are visible at each point in the program and resolves variable references to their declarations.
Scope types
Global scope
Function scope
Block scope
Lexical scope
The outermost scope containing global variables and functions. Accessible from anywhere in the program. // Global scope
const globalVar = 42 ;
function globalFunction () {
return globalVar ; // Can access global scope
}
Variables declared with var are function-scoped. They are visible throughout the entire function. function example () {
if ( true ) {
var x = 10 ; // Function-scoped
}
console . log ( x ); // 10 - accessible outside if block
}
Variables declared with let and const are block-scoped. They are only visible within their block. function example () {
if ( true ) {
let x = 10 ; // Block-scoped
const y = 20 ; // Block-scoped
}
console . log ( x ); // ReferenceError
}
Inner functions have access to variables in outer scopes through lexical scoping. function outer () {
const x = 10 ;
function inner () {
console . log ( x ); // Can access outer scope
}
return inner ;
}
Symbol table implementation
A symbol table tracks variable declarations and their scopes:
Symbol table
Scope analyzer
class SymbolTable {
private:
struct Symbol {
std ::string name;
enum class Kind { Variable , Function , Parameter };
Kind kind;
int scopeLevel;
};
std ::vector < std ::unordered_map < std ::string, Symbol >> scopes;
int currentLevel = 0 ;
public:
void enterScope () {
scopes . push_back ({});
currentLevel ++ ;
}
void exitScope () {
scopes . pop_back ();
currentLevel -- ;
}
void define ( const std :: string & name , Symbol :: Kind kind ) {
Symbol sym{name, kind, currentLevel};
scopes . back ()[name] = sym;
}
Symbol * resolve ( const std :: string & name ) {
// Search from innermost to outermost scope
for ( auto it = scopes . rbegin (); it != scopes . rend (); ++ it) {
auto found = it -> find (name);
if (found != it -> end ()) {
return & found -> second ;
}
}
return nullptr ; // Undefined variable
}
};
Scope analysis must handle hoisting for var declarations and functions, while maintaining temporal dead zones for let and const.
Variable hoisting
JavaScript’s hoisting behavior must be correctly implemented:
// This code:
console . log ( x ); // undefined (not ReferenceError)
var x = 5 ;
// Is interpreted as:
var x ; // Hoisted to top
console . log ( x );
x = 5 ;
// But let/const have temporal dead zone:
console . log ( y ); // ReferenceError
let y = 5 ;
Next steps
Bytecode and VM Learn how to compile AST to bytecode and execute it in a stack-based virtual machine
Closures and prototypes Implement closures with environment chains and prototype-based inheritance