Overview
The Mini-Compilador Educativo transforms source code through six sequential phases, each building on the output of the previous one. This page provides an in-depth look at how a sample program flows through the entire compilation pipeline.Example Program
We’ll trace this simple program through all phases:Expected output:
30 (because b * 2 = 20, then a + 20 = 25… wait, let me recalculate: 10 * 2 = 20, 5 + 20 = 25. Actually the output should be 25.)Phase 1: Lexical Analysis (Scanner)
Processing
The
Scanner class reads character-by-character:- Recognizes keywords by dictionary lookup
- Identifies operators as single characters
- Accumulates digit sequences into numbers
- Collects alphanumeric sequences into identifiers
- Tracks line and column positions
Key Features
Position Tracking
Every token stores its line and column for precise error reporting.
Value Extraction
Numeric tokens carry both the original string (
lexema) and parsed integer (valor).Comment Handling
// comments are consumed without generating tokens (see compfinal.py:319-327).Error Collection
Invalid characters add to
Scanner.errores list but don’t stop scanning.Phase 2: Syntactic Analysis (Parser)
Processing
The
Parser uses recursive descent to match tokens against grammar rules:parsear()→ processes statements until EOF_sentencia()→ dispatches to_declaracion_variable()or_sentencia_print()_expresion()→ delegates to_suma_resta()→_multiplicacion_division()→_primario()- Operator precedence handled by nesting depth
AST Representation
- Tree View
- Python Objects
Precedence in Action
Expression:
a + b * 2Why it parses as a + (b * 2) and not (a + b) * 2:_expresion()calls_suma_resta()_suma_resta()calls_multiplicacion_division()for left side → getsa- Sees
+, so calls_multiplicacion_division()for right side _multiplicacion_division()processesb * 2completely (consuming both tokens) before returning- Result:
ExpresionBinaria(+, a, ExpresionBinaria(*, b, 2))
compfinal.py:802-843Phase 3: Semantic Analysis
Processing
The
AnalizadorSemantico traverses the AST with these checks:Variable Tracking:- Maintains
variables_declaradasset - On
DeclaracionVariable: analyze expression first, then add variable - On
Identificador: verify variable exists in set
- On
ExpresionBinariawith/operator - If right side is
NumeroLiteralwithvalor == 0, report error
Semantic Trace
compfinal.py:968-1085
Phase 4: Intermediate Code Generation
Processing
The
GeneradorIR creates Three Address Code (TAC):- Each operation becomes one TAC instruction
- Temporary variables (
t0,t1, …) store intermediate results - Variables use their original names
Generated IR Code
TAC Properties
Explicit Ordering
Operations occur in sequential order, making control flow obvious.
Single Operator
Each instruction has at most one operator, simplifying optimization and code generation.
Platform Independent
TAC is abstract; not tied to any specific CPU architecture.
Optimization Ready
Easy to apply optimizations like constant folding, dead code elimination, etc.
compfinal.py:1090-1149
Phase 5: Interpretation
Processing
The
Interprete directly executes the AST:- Maintains
variablesdictionary for runtime values - Recursively evaluates expressions
- Outputs print statement results to console
Execution Trace
The interpreter uses floor division for
/ operator: 7 / 2 evaluates to 3, not 3.5.Source: compfinal.py:1209compfinal.py:1156-1215
Phase 6: Assembly Code Generation
Processing
The
GeneradorASM produces x86 assembly for EMU8086:- Data section: Declare all variables as words (
dw) - Code section: Translate each statement:
- Use
AXregister as accumulator - Use stack for nested expressions
- Call
print_numprocedure for output
- Use
- Utility procedures: Include
print_numfor integer output
Generated Assembly
Assembly Generation Strategy
- Stack-Based Evaluation
- Register Usage
- DOS Interrupts
- Print Number Logic
Binary expressions use stack to preserve left operand:This handles nested expressions naturally through recursion.
compfinal.py:1221-1390
Phase Comparison
Here’s how the same expressiona + b * 2 is represented at each phase:
| Phase | Representation |
|---|---|
| 1. Lexer | [ID(a), SUMA, ID(b), MULT, NUM(2)] |
| 2. Parser | BinExpr(+, ID(a), BinExpr(*, ID(b), Lit(2))) |
| 3. Semantic | ✓ Variables exist, no errors |
| 4. IR | t0 = b * 2 t1 = a + t0 |
| 5. Interpreter | Evaluates to: 25 |
| 6. Assembly | mov ax, b imul 2 add ax, a (simplified) |
Error Handling Across Phases
Lexical Errors (Phase 1)
Lexical Errors (Phase 1)
Detected by:
Examples:
ScannerExamples:
- Invalid characters:
let x = 5@; - Unexpected symbols:
let x = $100;
- Error added to
Scanner.errores - Token type set to
ERROR - Scanning continues to find more errors
Syntax Errors (Phase 2)
Syntax Errors (Phase 2)
Detected by:
Examples:
ParserExamples:
- Missing semicolon:
let x = 5 - Wrong token order:
= x 5 let; - Unmatched parentheses:
let x = (5 + 3;
ErrorSintaxisexception raised- Error caught and added to
Parser.errores - Synchronization to next statement boundary
- Parsing continues
Semantic Errors (Phase 3)
Semantic Errors (Phase 3)
Detected by:
Examples:
AnalizadorSemanticoExamples:
- Undeclared variable:
print x;(withoutlet x = ...;) - Division by zero:
let x = 10 / 0;
- Error added to
AnalizadorSemantico.errores - Analysis continues to find more errors
No Runtime Errors (Phases 4-6)
No Runtime Errors (Phases 4-6)
Once semantic analysis passes, subsequent phases assume correctness:
- IR Generation: Always succeeds with valid AST
- Interpretation: May encounter runtime division by zero (not caught)
- Assembly Generation: Produces syntactically correct code
Complete Compilation Output
- Console Output
- Generated Assembly
Performance Characteristics
Time Complexity
- Lexer: O(n) where n = character count
- Parser: O(n) where n = token count
- Semantic: O(n) where n = AST node count
- IR/ASM: O(n) where n = AST node count
- Overall: O(n) linear in input size
Space Complexity
- Token list: O(n) tokens
- AST: O(n) nodes
- Symbol table: O(v) variables
- IR code: O(n) instructions
- Overall: O(n) linear in input size
The compiler completes in milliseconds for typical programs (< 100 lines). GUI shows compilation time in status bar.
Next Steps
Try It Yourself
Experiment with the compiler by:
- Modifying the example program
- Introducing intentional errors to see error messages
- Viewing the AST and IR output (enable in GUI checkboxes)
- Exporting and running the assembly code in EMU8086