Skip to main content
Learn how to transform raw JavaScript source code into an Abstract Syntax Tree (AST) through lexical analysis, tokenization, and parsing. This is the foundational step in building a JavaScript engine.

Overview

Parsing is the process of converting JavaScript source code into a structured representation that the engine can understand and execute. This involves multiple stages working together to analyze the syntax and build an AST.
The parser is the first major component of a JavaScript engine, sitting between the raw source code and the interpreter/compiler.

Lexical analysis and tokenization

Lexical analysis is the process of breaking down source code into meaningful tokens. Each token represents a fundamental building block of the language.
1

Scan characters

Read the source code character by character, identifying patterns that form tokens.
2

Classify tokens

Categorize each token by type: keywords, identifiers, operators, literals, punctuation.
3

Generate token stream

Produce a sequence of tokens that will be consumed by the parser.

Token types

A robust tokenizer must recognize and classify different token types:
// Reserved words in JavaScript
function, const, let, var, if, else, while, for, return, class, extends
Proper handling of edge cases like Unicode identifiers, regex literals, and template strings is crucial for a production-ready tokenizer.

Recursive descent parser

The recursive descent parser is a top-down parsing technique where each non-terminal in the grammar is implemented as a function.

Parser structure

The parser consists of mutually recursive functions that mirror the grammar rules:
class Parser {
private:
    Lexer& lexer;
    Token currentToken;
    
    void advance() {
        currentToken = lexer.nextToken();
    }
    
    bool match(TokenType type) {
        if (currentToken.type == type) {
            advance();
            return true;
        }
        return false;
    }
    
public:
    ASTNode* parseProgram();
    ASTNode* parseStatement();
    ASTNode* parseExpression();
    ASTNode* parsePrimary();
};

Operator precedence

Handling operator precedence correctly is essential for parsing expressions:
PrecedenceOperatorsAssociativity
15(), [], .Left-to-right
14++, --, !, typeofRight-to-left
13**Right-to-left
12*, /, %Left-to-right
11+, -Left-to-right
10<, <=, >, >=Left-to-right
9==, ===, !=, !==Left-to-right
5&&Left-to-right
4``Left-to-right
3=, +=, -=, etc.Right-to-left

AST construction and validation

The Abstract Syntax Tree is a hierarchical representation of the program structure. Each node represents a syntactic construct.

AST node types

Different AST nodes represent different language constructs:

Declarations

  • Function declarations
  • Variable declarations
  • Class declarations

Statements

  • Expression statements
  • If/else statements
  • Loop statements
  • Return statements

Expressions

  • Binary expressions
  • Call expressions
  • Member expressions
  • Literals

Special

  • Program (root node)
  • Block statements
  • Function parameters
class ASTNode {
public:
    enum class Type {
        Program,
        FunctionDeclaration,
        VariableDeclaration,
        BinaryExpression,
        CallExpression,
        Identifier,
        Literal
    };
    
    virtual ~ASTNode() = default;
    virtual Type getType() const = 0;
    virtual void accept(ASTVisitor& visitor) = 0;
};

AST validation

After construction, the AST should be validated for semantic correctness:
1

Type checking

Verify that operations are performed on compatible types.
2

Reference validation

Ensure all variables and functions are declared before use.
3

Control flow analysis

Check for unreachable code, missing return statements, and other control flow issues.

Scope analysis and symbol resolution

Scope analysis determines which declarations are visible at each point in the program and resolves variable references to their declarations.

Scope types

The outermost scope containing global variables and functions. Accessible from anywhere in the program.
// Global scope
const globalVar = 42;

function globalFunction() {
    return globalVar; // Can access global scope
}

Symbol table implementation

A symbol table tracks variable declarations and their scopes:
class SymbolTable {
private:
    struct Symbol {
        std::string name;
        enum class Kind { Variable, Function, Parameter };
        Kind kind;
        int scopeLevel;
    };
    
    std::vector<std::unordered_map<std::string, Symbol>> scopes;
    int currentLevel = 0;
    
public:
    void enterScope() {
        scopes.push_back({});
        currentLevel++;
    }
    
    void exitScope() {
        scopes.pop_back();
        currentLevel--;
    }
    
    void define(const std::string& name, Symbol::Kind kind) {
        Symbol sym{name, kind, currentLevel};
        scopes.back()[name] = sym;
    }
    
    Symbol* resolve(const std::string& name) {
        // Search from innermost to outermost scope
        for (auto it = scopes.rbegin(); it != scopes.rend(); ++it) {
            auto found = it->find(name);
            if (found != it->end()) {
                return &found->second;
            }
        }
        return nullptr; // Undefined variable
    }
};
Scope analysis must handle hoisting for var declarations and functions, while maintaining temporal dead zones for let and const.

Variable hoisting

JavaScript’s hoisting behavior must be correctly implemented:
// This code:
console.log(x); // undefined (not ReferenceError)
var x = 5;

// Is interpreted as:
var x; // Hoisted to top
console.log(x);
x = 5;

// But let/const have temporal dead zone:
console.log(y); // ReferenceError
let y = 5;

Next steps

Bytecode and VM

Learn how to compile AST to bytecode and execute it in a stack-based virtual machine

Closures and prototypes

Implement closures with environment chains and prototype-based inheritance

Build docs developers (and LLMs) love