Skip to main content

Overview

The Code Generator is the final phase of compilation. It transforms the Abstract Syntax Tree into x86 assembly code that can be assembled and run on EMU8086, a popular 8086 emulator.
Assembly Language is the lowest level of human-readable code:
  • One-to-one mapping with machine instructions
  • Architecture-specific (x86 in this case)
  • Requires knowledge of registers, memory, and CPU operations

Target Architecture: x86 (8086)

The generator produces code for the Intel 8086 processor:

16-bit CPU

  • Registers are 16 bits (2 bytes)
  • Memory addresses are 16 bits
  • Integer range: -32,768 to 32,767 (signed)

Segmented Memory

  • Code Segment (CS)
  • Data Segment (DS)
  • Stack Segment (SS)
  • Small memory model (.model small)

Limited Registers

  • AX - Accumulator (primary math register)
  • BX - Base (secondary operations)
  • CX - Counter (loops)
  • DX - Data (I/O, high word in division)

DOS Interrupts

  • INT 21h for system calls
  • AH register selects function
  • Output characters, strings, exit

Assembly Structure

Generated code follows this template:
.model small        ; Memory model
.stack 100h         ; Stack size (256 bytes)

.data               ; Data segment
; Variable declarations here

.code               ; Code segment
main proc           ; Main procedure starts
mov ax, @data      ; Initialize data segment
mov ds, ax

; Your program here

mov ah,4ch         ; Exit to DOS
int 21h
main endp          ; Main procedure ends

; Utility procedures

end main           ; Program end

Code Generation Strategy

The generator uses a stack-based evaluation approach:
1

Data Section

Declare all variables as 16-bit words:
.data
x dw 0
y dw 0
result dw 0
2

Statement Translation

Convert each statement to assembly:
  • Variable declarations → assignments
  • Print statements → call to print procedure
3

Expression Evaluation

Use AX as accumulator, stack for nested operations:
  1. Evaluate left operand → result in AX
  2. Push AX (save left result)
  3. Evaluate right operand → result in AX
  4. Move AX to BX
  5. Pop AX (restore left result)
  6. Perform operation: AX op BX → AX
4

Utility Procedures

Include helper functions:
  • print_num - Convert integer to decimal and print

Register Usage

Primary register for all expression results:
mov ax, 5       ; Load constant
mov ax, x       ; Load variable
add ax, bx      ; Addition result
imul bx         ; Multiplication result
Always holds the current expression value.

Expression Generation

Binary Expressions

For a + b * c, which parses as a + (b * c):
    +
   / \
  a   *
     / \
    b   c

Operators

; AX = left + right
mov ax, left
push ax
mov ax, right
mov bx, ax
pop ax
add ax, bx          ; AX = AX + BX

Statement Generation

Source:
let x = 5 + 3;
Assembly:
; let x = 5 + 3;
mov ax, 5
push ax
mov ax, 3
mov bx, ax
pop ax
add ax, bx
mov x, ax           ; Store result in variable
Converts integer in AX to decimal string and prints:
  1. Special case: If AX = 0, print ‘0’ and exit
  2. Extract digits: Repeatedly divide by 10
    • Remainder = digit (pushed on stack)
    • Quotient = remaining number
  3. Print digits: Pop and print each digit
Example: Print 123
123 / 10 = 12 remainder 3  → push '3'
 12 / 10 =  1 remainder 2  → push '2'
  1 / 10 =  0 remainder 1  → push '1'

Pop and print: '1', '2', '3' → "123"

Complete Generation Example

let a = 5;
let b = 10;
let c = a + b * 2;
print c;

Data Section Generation

Collects all variables and generates declarations:
def generar(self, programa):
    # Collect all variable names
    variables = set()
    for sentencia in programa.sentencias:
        if isinstance(sentencia, DeclaracionVariable):
            variables.add(sentencia.nombre.lexema)
    
    # Generate data section
    codigo += ".data\n"
    for var in sorted(variables):  # Sorted for consistency
        codigo += f"{var} dw 0\n"
    codigo += "msg db 13,10,'$'\n\n"  # Newline message
Example:
let x = 5;
let y = 10;
let sum = x + y;
Generates:
.data
sum dw 0
x dw 0
y dw 0
msg db 13,10,'$'

DOS Interrupts

Print single character
mov dl, 'A'     ; Character to print
mov ah, 02h     ; Function: print character
int 21h         ; Call DOS
Output: A

Optimization Opportunities

The current generator does not optimize. Opportunities:
Current:
mov ax, 5
push ax
mov ax, 3
mov bx, ax
pop ax
add ax, bx
mov x, ax
Optimized:
mov ax, 8       ; Computed at compile-time
mov x, ax
Current: Always uses AX as accumulator, stack for intermediatesOptimized: Use BX, CX, DX for multiple values
mov ax, a       ; a in AX
mov bx, b       ; b in BX
mov cx, c       ; c in CX
imul bx, cx     ; b * c in BX
add ax, bx      ; a + (b*c) in AX
Avoids push/pop overhead.
Current:
mov ax, x
mov bx, 2
imul bx         ; Multiply
Optimized:
mov ax, x
shl ax, 1       ; Shift left = multiply by 2 (faster)
Current:
mov ax, 5
mov x, ax
mov ax, 10      ; Overwrites AX without using it
mov y, ax
Optimized:
mov ax, 5
mov x, ax
mov ax, 10
mov y, ax
(No change here, but complex cases could eliminate unused code)

Limitations

What the code generator CANNOT handle:
  1. Floating-point: All arithmetic is integer
    let x = 7 / 2;  // Result: 3, not 3.5
    
  2. Large numbers: 16-bit limit (-32,768 to 32,767)
    let x = 40000;  // Overflow!
    
  3. Arrays/Strings: Only scalar integers supported
  4. Control flow: No if/while/for (language doesn’t support them)
  5. Functions: No user-defined functions (only main + print_num)

Source Code Reference

Implementation

File: compfinal.pyLines: 1221-1390Key Class:
  • GeneradorASM - Assembly code generator
Main Methods:
  • generar(programa) - Entry point, returns complete .asm file as string
  • generar_sentencia(sentencia) - Generate code for one statement
  • generar_expresion(expr) - Generate code for expression (recursive)
Helper Methods:
  • _recolectar_variables(programa) - Find all variable names
  • _generar_encabezado() - Generate .model, .stack, .data sections
  • _generar_procedimiento_print() - Generate print_num procedure

Next Steps

Try It

  1. Write a program in the GUI
  2. Click Exportar ASM
  3. Save to program.asm
  4. Open in EMU8086
  5. Compile and run

API Reference

Detailed API documentation for the GeneradorASM class

Examples

See more example programs and their generated assembly

Build docs developers (and LLMs) love