Bytecode and VM execution

Transform Abstract Syntax Trees into compact bytecode and execute them in a stack-based virtual machine. This is the core execution engine that brings JavaScript programs to life.

Overview

Bytecode is an intermediate representation between high-level source code and machine code. A virtual machine (VM) interprets bytecode instructions to execute programs.

Bytecode execution provides a balance between interpretation speed and implementation complexity, making it ideal for dynamic languages like JavaScript.

Bytecode design and instruction set

A well-designed instruction set is crucial for efficient VM execution. Each instruction should be simple, compact, and map directly to common operations.

Instruction format

Bytecode instructions typically use a fixed or variable-length format:

enum class Opcode : uint8_t {
    // Stack operations
    PUSH_CONSTANT,    // Push constant to stack
    POP,              // Pop value from stack
    
    // Arithmetic operations
    ADD,              // Pop two values, push sum
    SUB,              // Pop two values, push difference
    MUL,              // Pop two values, push product
    DIV,              // Pop two values, push quotient
    MOD,              // Pop two values, push remainder
    
    // Comparison operations
    EQ,               // Equal
    NEQ,              // Not equal
    LT,               // Less than
    GT,               // Greater than
    LTE,              // Less than or equal
    GTE,              // Greater than or equal
    
    // Logical operations
    AND,              // Logical AND
    OR,               // Logical OR
    NOT,              // Logical NOT
    
    // Variable operations
    LOAD_VAR,         // Load variable to stack
    STORE_VAR,        // Store stack top to variable
    LOAD_GLOBAL,      // Load global variable
    STORE_GLOBAL,     // Store to global variable
    
    // Control flow
    JUMP,             // Unconditional jump
    JUMP_IF_FALSE,    // Jump if stack top is false
    JUMP_IF_TRUE,     // Jump if stack top is true
    
    // Function operations
    CALL,             // Call function
    RETURN,           // Return from function
    
    // Object operations
    NEW_OBJECT,       // Create new object
    GET_PROPERTY,     // Get object property
    SET_PROPERTY,     // Set object property
    
    // Array operations
    NEW_ARRAY,        // Create new array
    GET_ELEMENT,      // Get array element
    SET_ELEMENT,      // Set array element
    
    // Special
    HALT              // Stop execution
};

struct Instruction {
    Opcode opcode;
    uint32_t operand;  // Optional operand
};

Instruction categories

Stack manipulation

Instructions for managing the operand stack:

PUSH_CONSTANT: Push a constant value onto the stack
POP: Remove the top value from the stack
DUP: Duplicate the top stack value
SWAP: Swap the top two stack values

Arithmetic and logic

Binary and unary operations:

ADD, SUB, MUL, DIV, MOD: Arithmetic operations
EQ, NEQ, LT, GT, LTE, GTE: Comparison operations
AND, OR, NOT: Logical operations
NEG: Unary negation

Variable access

Loading and storing variables:

LOAD_VAR, STORE_VAR: Local variable access
LOAD_GLOBAL, STORE_GLOBAL: Global variable access
LOAD_UPVALUE, STORE_UPVALUE: Closure variable access

Control flow

Branching and jumping:

JUMP: Unconditional jump to offset
JUMP_IF_FALSE: Conditional jump (for if statements, loops)
JUMP_IF_TRUE: Conditional jump (for logical OR)
LOOP: Jump backward (for loop constructs)

Function calls

Function invocation and returns:

CALL: Call function with N arguments
RETURN: Return from function with value
CLOSURE: Create closure with upvalues

Compiling AST to bytecode

The compiler traverses the AST and emits bytecode instructions:

class Compiler : public ASTVisitor {
private:
    BytecodeBuffer& bytecode;
    SymbolTable& symbols;
    
public:
    Compiler(BytecodeBuffer& bc, SymbolTable& syms)
        : bytecode(bc), symbols(syms) {}
    
    void visit(BinaryExpression& node) override {
        // Compile left operand
        node.left->accept(*this);
        
        // Compile right operand
        node.right->accept(*this);
        
        // Emit operator instruction
        if (node.op == "+") {
            bytecode.emitOpcode(Opcode::ADD);
        } else if (node.op == "-") {
            bytecode.emitOpcode(Opcode::SUB);
        } else if (node.op == "*") {
            bytecode.emitOpcode(Opcode::MUL);
        } else if (node.op == "/") {
            bytecode.emitOpcode(Opcode::DIV);
        }
        // ... more operators
    }
    
    void visit(Literal& node) override {
        // Add constant to constant pool
        uint32_t index = bytecode.addConstant(node.value);
        
        // Emit PUSH_CONSTANT instruction
        bytecode.emitOpcode(Opcode::PUSH_CONSTANT);
        bytecode.emitOperand(index);
    }
    
    void visit(Identifier& node) override {
        // Look up variable in symbol table
        Symbol* sym = symbols.resolve(node.name);
        
        if (sym->isGlobal()) {
            bytecode.emitOpcode(Opcode::LOAD_GLOBAL);
        } else {
            bytecode.emitOpcode(Opcode::LOAD_VAR);
        }
        bytecode.emitOperand(sym->index);
    }
};

Jump offsets must be carefully calculated and patched after emitting forward jumps, as the target location isn’t known until the code is generated.

Stack-based VM execution

The virtual machine maintains an operand stack and executes instructions sequentially.

VM architecture

Instruction pointer

Tracks the current position in bytecode, advancing after each instruction.

Operand stack

Holds intermediate values during computation. Operations pop operands and push results.

Call frames

Maintains function call context including return address and local variables.

Global environment

Stores global variables and functions accessible from anywhere.

VM implementation

class VirtualMachine {
private:
    std::vector<Value> stack;
    std::vector<CallFrame> frames;
    std::unordered_map<std::string, Value> globals;
    
    const uint8_t* ip;  // Instruction pointer
    
    void push(const Value& value) {
        stack.push_back(value);
    }
    
    Value pop() {
        Value value = stack.back();
        stack.pop_back();
        return value;
    }
    
    Value peek(int distance = 0) {
        return stack[stack.size() - 1 - distance];
    }
    
    uint32_t readOperand() {
        uint32_t operand = (ip[0] << 24) | (ip[1] << 16) | 
                          (ip[2] << 8) | ip[3];
        ip += 4;
        return operand;
    }
    
public:
    void execute(const BytecodeBuffer& bytecode);
};

Stack-based VMs are simpler to implement than register-based VMs but may require more instructions for complex operations.

Optimization techniques

Instruction dispatch optimization

Different techniques can speed up instruction dispatch:

Switch statement
Computed goto
Direct threading

Traditional approach using a switch statement. Simple but may not be optimally compiled.

switch (opcode) {
    case Opcode::ADD: /* ... */ break;
    case Opcode::SUB: /* ... */ break;
    // ...
}

GCC extension that provides faster dispatch using jump tables.

static void* dispatchTable[] = {
    &&OP_ADD, &&OP_SUB, &&OP_MUL, /* ... */
};

#define DISPATCH() goto *dispatchTable[*ip++]

OP_ADD:
    // Execute ADD
    DISPATCH();

OP_SUB:
    // Execute SUB
    DISPATCH();

Store instruction handlers directly in bytecode for fastest dispatch.

void** code = new void*[codeSize];
code[0] = &&OP_PUSH_CONSTANT;
code[1] = (void*)constantIndex;
code[2] = &&OP_ADD;

void** pc = code;
goto *(*pc++);

Value representation

Efficient value representation is critical for performance:

// Use IEEE 754 double NaN space to encode types
union Value {
    double number;
    uint64_t bits;
    
    // Use quiet NaN range for non-number types
    static constexpr uint64_t QNAN = 0x7ffc000000000000;
    static constexpr uint64_t SIGN_BIT = 0x8000000000000000;
    
    bool isNumber() const {
        return (bits & QNAN) != QNAN;
    }
    
    bool isObject() const {
        return (bits & (QNAN | SIGN_BIT)) == (QNAN | SIGN_BIT);
    }
    
    Object* asObject() const {
        return reinterpret_cast<Object*>(bits & ~(QNAN | SIGN_BIT));
    }
};

Performance considerations

Inline caching

Cache property lookups and method calls for faster repeated access

Constant folding

Evaluate constant expressions at compile time rather than runtime

Dead code elimination

Remove unreachable code to reduce bytecode size

Peephole optimization

Replace instruction sequences with more efficient equivalents

Premature optimization can lead to more complex code without significant benefits. Profile first, then optimize hot paths.

Parsing & Execution

Runtime

Overview

Bytecode design and instruction set

Instruction format

Instruction categories

Compiling AST to bytecode

Stack-based VM execution

VM architecture

VM implementation

Optimization techniques

Instruction dispatch optimization

Value representation

Performance considerations

Inline caching

Constant folding

Dead code elimination

Peephole optimization

Next steps

Closures and prototypes

Parsing and AST

Build docs developers (and LLMs) love

Parsing & Execution

Runtime

​Overview

​Bytecode design and instruction set

​Instruction format

​Instruction categories

​Compiling AST to bytecode

​Stack-based VM execution

​VM architecture

​VM implementation

​Optimization techniques

​Instruction dispatch optimization

​Value representation

​Performance considerations

Inline caching

Constant folding

Dead code elimination

Peephole optimization

​Next steps

Closures and prototypes

Parsing and AST

Build docs developers (and LLMs) love

Overview

Bytecode design and instruction set

Instruction format

Instruction categories

Compiling AST to bytecode

Stack-based VM execution

VM architecture

VM implementation

Optimization techniques

Instruction dispatch optimization

Value representation

Performance considerations

Next steps