Overview
The Code Generator is the final phase of compilation. It transforms the Abstract Syntax Tree into x86 assembly code that can be assembled and run on EMU8086, a popular 8086 emulator.
Assembly Language is the lowest level of human-readable code:
One-to-one mapping with machine instructions
Architecture-specific (x86 in this case)
Requires knowledge of registers, memory, and CPU operations
Target Architecture: x86 (8086)
The generator produces code for the Intel 8086 processor:
16-bit CPU
Registers are 16 bits (2 bytes)
Memory addresses are 16 bits
Integer range: -32,768 to 32,767 (signed)
Segmented Memory
Code Segment (CS)
Data Segment (DS)
Stack Segment (SS)
Small memory model (.model small)
Limited Registers
AX - Accumulator (primary math register)
BX - Base (secondary operations)
CX - Counter (loops)
DX - Data (I/O, high word in division)
DOS Interrupts
INT 21h for system calls
AH register selects function
Output characters, strings, exit
Assembly Structure
Generated code follows this template:
.model small ; Memory model
.stack 100h ; Stack size (256 bytes)
.data ; Data segment
; Variable declarations here
.code ; Code segment
main proc ; Main procedure starts
mov ax , @data ; Initialize data segment
mov ds , ax
; Your program here
mov ah , 4ch ; Exit to DOS
int 21h
main endp ; Main procedure ends
; Utility procedures
end main ; Program end
Code Generation Strategy
The generator uses a stack-based evaluation approach:
Data Section
Declare all variables as 16-bit words: .data
x dw 0
y dw 0
result dw 0
Statement Translation
Convert each statement to assembly:
Variable declarations → assignments
Print statements → call to print procedure
Expression Evaluation
Use AX as accumulator, stack for nested operations:
Evaluate left operand → result in AX
Push AX (save left result)
Evaluate right operand → result in AX
Move AX to BX
Pop AX (restore left result)
Perform operation: AX op BX → AX
Utility Procedures
Include helper functions:
print_num - Convert integer to decimal and print
Register Usage
AX - Accumulator
BX - Secondary
CX - Counter
DX - Data/High Word
Primary register for all expression results: mov ax , 5 ; Load constant
mov ax , x ; Load variable
add ax , bx ; Addition result
imul bx ; Multiplication result
Always holds the current expression value.Holds right operand during binary operations: mov bx , ax ; Save right operand
pop ax ; Get left operand
add ax , bx ; Perform operation
Used in print_num for digit counting: mov cx , 0 ; Initialize counter
inc cx ; Count digits
loop print_loop ; Loop CX times
Multiple uses:
High word in division (div uses DX:AX)
DOS interrupt parameter
Digit extraction in print_num
xor dx , dx ; Clear before division
div bx ; DX:AX / BX
mov ah , 02h ; DOS function
int 21h ; Call DOS (uses DL)
Expression Generation
Binary Expressions
For a + b * c, which parses as a + (b * c):
AST
Generated Assembly
Stack Trace
; Evaluate: a + (b * c)
; Load left operand (a)
mov ax , a
push ax ; Save 'a' on stack
; Evaluate right side (b * c)
mov ax , b
push ax ; Save 'b'
mov ax , c ; Load 'c'
mov bx , ax ; c → BX
pop ax ; b → AX
imul bx ; AX = b * c
; Perform addition
mov bx , ax ; (b*c) → BX
pop ax ; a → AX
add ax , bx ; AX = a + (b*c)
Stack: []
mov ax, a → AX=5
push ax → Stack: [5]
mov ax, b → AX=10
push ax → Stack: [5, 10]
mov ax, c → AX=2
mov bx, ax → BX=2
pop ax → AX=10, Stack: [5]
imul bx → AX=20 (10*2)
mov bx, ax → BX=20
pop ax → AX=5, Stack: []
add ax, bx → AX=25 (5+20)
Operators
Addition
Subtraction
Multiplication
Division
; AX = left + right
mov ax , left
push ax
mov ax , right
mov bx , ax
pop ax
add ax , bx ; AX = AX + BX
; AX = left - right
mov ax , left
push ax
mov ax , right
mov bx , ax
pop ax
sub ax , bx ; AX = AX - BX
; AX = left * right
mov ax , left
push ax
mov ax , right
mov bx , ax
pop ax
imul bx ; AX = AX * BX (signed)
imul = signed multiply
mul = unsigned multiply
; AX = left / right
mov ax , left
push ax
mov ax , right
mov bx , ax
pop ax
xor dx , dx ; Clear DX (high word)
idiv bx ; AX = DX:AX / BX (signed)
Division uses DX:AX (32-bit dividend) / BX (16-bit divisor). Must clear DX before dividing positive numbers.
Statement Generation
Variable Declaration
Print Statement
Source: Assembly: ; let x = 5 + 3;
mov ax , 5
push ax
mov ax , 3
mov bx , ax
pop ax
add ax , bx
mov x, ax ; Store result in variable
Source: Assembly: ; print x;
mov ax , x ; Load variable
call print_num ; Call print procedure
mov ah , 09h ; Print newline
lea dx , msg
int 21h
Print Number Procedure
Converts integer in AX to decimal string and prints:
Algorithm
Implementation
Trace: Print 25
Special case: If AX = 0, print ‘0’ and exit
Extract digits: Repeatedly divide by 10
Remainder = digit (pushed on stack)
Quotient = remaining number
Print digits: Pop and print each digit
Example: Print 123123 / 10 = 12 remainder 3 → push '3'
12 / 10 = 1 remainder 2 → push '2'
1 / 10 = 0 remainder 1 → push '1'
Pop and print: '1', '2', '3' → "123"
print_num proc
push ax
push bx
push cx
push dx
mov cx , 0 ; Digit counter
mov bx , 10 ; Divisor
; Special case: zero
cmp ax , 0
jne convert
mov dl , ' 0 '
mov ah , 02h
int 21h
jmp done
convert:
next_digit:
xor dx , dx ; Clear DX
div bx ; DX:AX / 10
push dx ; Save remainder (digit)
inc cx ; Count digit
cmp ax , 0 ; More digits?
jne next_digit
print_loop:
pop dx ; Get digit
add dl , ' 0 ' ; Convert to ASCII
mov ah , 02h ; DOS: print char
int 21h
loop print_loop ; Repeat CX times
done:
pop dx
pop cx
pop bx
pop ax
ret
print_num endp
Input: AX = 25
First iteration:
DX:AX = 0:25
div 10 → AX=2, DX=5
push 5 → Stack: [5]
cx = 1
Second iteration:
DX:AX = 0:2
div 10 → AX=0, DX=2
push 2 → Stack: [5, 2]
cx = 2
AX == 0, exit loop
Print loop (2 iterations):
pop dx → DX=2, Stack: [5]
add dl, '0' → DL='2' (ASCII 50)
int 21h → prints '2'
pop dx → DX=5, Stack: []
add dl, '0' → DL='5' (ASCII 53)
int 21h → prints '5'
Output: "25"
Complete Generation Example
Source Code
Generated Assembly
Execution in EMU8086
let a = 5 ;
let b = 10 ;
let c = a + b * 2 ;
print c ;
.model small
.stack 100h
.data
a dw 0
b dw 0
c dw 0
msg db 13 , 10 ,'$'
.code
main proc
mov ax , @data
mov ds , ax
; let a = 5;
mov ax , 5
mov a, ax
; let b = 10;
mov ax , 10
mov b, ax
; let c = a + b * 2;
mov ax , a
push ax
mov ax , b
push ax
mov ax , 2
mov bx , ax
pop ax
imul bx
mov bx , ax
pop ax
add ax , bx
mov c, ax
; print c;
mov ax , c
call print_num
mov ah , 09h
lea dx , msg
int 21h
mov ah , 4ch
int 21h
main endp
; [print_num procedure here]
end main
Save code to program.asm
Open in EMU8086
Click Compile (or F9)
Click Run (or F5)
Output window shows:
Data Section Generation
Collects all variables and generates declarations:
def generar ( self , programa ):
# Collect all variable names
variables = set ()
for sentencia in programa.sentencias:
if isinstance (sentencia, DeclaracionVariable):
variables.add(sentencia.nombre.lexema)
# Generate data section
codigo += ".data \n "
for var in sorted (variables): # Sorted for consistency
codigo += f " { var } dw 0 \n "
codigo += "msg db 13,10,'$' \n\n " # Newline message
Example:
let x = 5 ;
let y = 10 ;
let sum = x + y ;
Generates:
.data
sum dw 0
x dw 0
y dw 0
msg db 13 , 10 ,'$'
DOS Interrupts
INT 21h, AH=02h
INT 21h, AH=09h
INT 21h, AH=4Ch
Print single character mov dl , 'A' ; Character to print
mov ah , 02h ; Function: print character
int 21h ; Call DOS
Output: A Print string msg db 'Hello$' ; String terminated with '$'
...
mov ah , 09h ; Function: print string
lea dx , msg ; DX = address of string
int 21h ; Call DOS
Output: Hello Strings must end with ’$’ character.
Exit program mov ah , 4ch ; Function: terminate program
int 21h ; Call DOS (never returns)
Returns control to operating system.
Optimization Opportunities
The current generator does not optimize . Opportunities:
Current: mov ax , 5
push ax
mov ax , 3
mov bx , ax
pop ax
add ax , bx
mov x, ax
Optimized: mov ax , 8 ; Computed at compile-time
mov x, ax
Current: Always uses AX as accumulator, stack for intermediatesOptimized: Use BX, CX, DX for multiple valuesmov ax , a ; a in AX
mov bx , b ; b in BX
mov cx , c ; c in CX
imul bx , cx ; b * c in BX
add ax , bx ; a + (b*c) in AX
Avoids push/pop overhead.
Current: mov ax , x
mov bx , 2
imul bx ; Multiply
Optimized: mov ax , x
shl ax , 1 ; Shift left = multiply by 2 (faster)
Current: mov ax , 5
mov x, ax
mov ax , 10 ; Overwrites AX without using it
mov y, ax
Optimized: mov ax , 5
mov x, ax
mov ax , 10
mov y, ax
(No change here, but complex cases could eliminate unused code)
Limitations
What the code generator CANNOT handle:
Floating-point: All arithmetic is integer
let x = 7 / 2 ; // Result: 3, not 3.5
Large numbers: 16-bit limit (-32,768 to 32,767)
let x = 40000 ; // Overflow!
Arrays/Strings: Only scalar integers supported
Control flow: No if/while/for (language doesn’t support them)
Functions: No user-defined functions (only main + print_num)
Source Code Reference
Implementation File: compfinal.pyLines: 1221-1390Key Class:
GeneradorASM - Assembly code generator
Main Methods:
generar(programa) - Entry point, returns complete .asm file as string
generar_sentencia(sentencia) - Generate code for one statement
generar_expresion(expr) - Generate code for expression (recursive)
Helper Methods:
_recolectar_variables(programa) - Find all variable names
_generar_encabezado() - Generate .model, .stack, .data sections
_generar_procedimiento_print() - Generate print_num procedure
Next Steps
Try It
Write a program in the GUI
Click Exportar ASM
Save to program.asm
Open in EMU8086
Compile and run
API Reference Detailed API documentation for the GeneradorASM class
Examples See more example programs and their generated assembly