Overview
The Mini-Compilador Educativo follows a classic multi-phase compiler architecture, transforming high-level source code through six distinct stages into executable x86 assembly code. Each phase is encapsulated in its own class, creating a modular and maintainable design.
Architecture Diagram
Core Components
Phase 1: Scanner (Lexical Analysis)
Class:ScannerInput: Raw source code string
Output: List of
Token objects
- Character-by-character scanning with position tracking
- Comment handling (
//single-line comments) - Reserved word dictionary lookup
- Error collection with line/column information
compfinal.py:207-491
Phase 2: Parser (Syntactic Analysis)
Class:ParserInput: List of tokens from Scanner
Output: Abstract Syntax Tree (AST) as
Programa object
AST Node Types
AST Node Types
Expressions (produce values):
NumeroLiteral- Integer constantsIdentificador- Variable referencesExpresionBinaria- Binary operations (+, -, *, /)ExpresionAgrupada- Parenthesized expressions
DeclaracionVariable- Variable assignmentsSentenciaPrint- Print statements
Programa- Container for all statements
Recursive descent with operator precedence handling:
compfinal.py:653-950
Phase 3: Semantic Analyzer
Class:AnalizadorSemanticoInput: AST from Parser
Output: Boolean success + error list
Variable Tracking
Maintains a set of declared variables. Detects use of undeclared identifiers.
Division by Zero
Catches literal division by zero (e.g.,
x / 0) at compile time.- Variables must be declared before use
- No division by zero with numeric literals
- Warns on variable redeclaration
compfinal.py:968-1085
Phase 4: Intermediate Code Generator
Class:GeneradorIRInput: Validated AST
Output: Three Address Code (TAC) as list of strings TAC Format:
- Each instruction has at most one operator
- Temporary variables (
t0,t1, …) hold intermediate results - Platform-independent representation
- Suitable for optimization passes
compfinal.py:1090-1149
Phase 5: Interpreter
Class:InterpreteInput: AST
Output: Runtime execution results
The interpreter executes the program by evaluating the AST directly, providing immediate feedback without requiring assembly or machine code generation.
- Variable storage in dictionary
- Direct expression evaluation
- Integer arithmetic (using floor division for
/) - Print output to console
compfinal.py:1156-1215
Phase 6: Assembly Code Generator
Class:GeneradorASMInput: AST
Output: x86 assembly code string (EMU8086 format) Generated Structure:
- Stack-based evaluation of expressions
AXregister for accumulatorBXregister for right operand- Included
print_numroutine for output
compfinal.py:1221-1390
Data Flow
Here’s how a simple program flows through the compiler:Error Handling Strategy
- Lexical Errors
- Syntax Errors
- Semantic Errors
Collected in
Scanner.errores list:- Invalid characters
- Reported with line/column
- Compilation halts if errors present
Class Relationships
Key Design Principles
Separation of Concerns
Each phase handles one responsibility, making the codebase easy to understand and extend.
Visitor Pattern
AST traversal uses implicit visitor pattern through type checking with
isinstance().Error Recovery
Parser synchronization allows reporting multiple errors in one pass.
Immutable AST
AST nodes use
@dataclass for clean, immutable structures.