Skip to main content

Overview

The IR Generator is the fourth phase of compilation. It transforms the Abstract Syntax Tree (AST) into Three Address Code (TAC), a low-level, platform-independent representation that bridges high-level language and machine code.
Three Address Code limits each instruction to:
  • At most one operator
  • At most three addresses (operands)
Example:
// High-level: x = a + b * c
// TAC:
t0 = b * c
t1 = a + t0
x = t1

Why Intermediate Representation?

Platform Independence

TAC is not tied to any CPU architecture.Same IR can target:
  • x86 assembly
  • ARM assembly
  • LLVM IR
  • JVM bytecode

Optimization

Simpler than AST, easier to optimize:
  • Constant folding
  • Dead code elimination
  • Common subexpression elimination
  • Register allocation

Explicit Order

Operations in sequential order - no ambiguity about evaluation.AST: Tree structure (implicit ordering) TAC: Linear instructions (explicit ordering)

Simple Instructions

Each instruction has one operation - easy to translate to assembly.
t0 = a + b  → ADD instruction
t1 = t0 * c → MUL instruction

TAC Instruction Format

All instructions follow these patterns:
result = operand1 operator operand2
Examples:
t0 = a + b
t1 = x * 2
t2 = t0 - t1

Temporary Variables

Complex expressions require intermediate storage:
class GeneradorIR:
    def __init__(self):
        self.codigo = []         # List of TAC instructions
        self.temp_counter = 0    # Temporary variable counter
    
    def nueva_temp(self):
        temp = f"t{self.temp_counter}"
        self.temp_counter += 1
        return temp
Naming:
  • t0, t1, t2, … (sequential integers)
  • Unique within a program
  • Never reused (simple approach - optimization could reuse)

Generation Algorithm

The generator recursively traverses the AST:
1

Initialize

  • Create empty instruction list
  • Reset temporary counter to 0
2

Process Statements

For each statement in the program:
  • Generate IR for the statement
  • Append instructions to list
3

Emit Instructions

  • Return complete list of TAC instructions
  • Print to console for visibility

Statement Generation

def generar_sentencia(self, sentencia):
    if isinstance(sentencia, DeclaracionVariable):
        # Generate code for right-hand expression
        resultado = self.generar_expr(sentencia.expresion)
        
        # Emit assignment
        self.codigo.append(f"{sentencia.nombre.lexema} = {resultado}")
Example:
let x = 5 + 3;
Generated IR:
t0 = 5 + 3
x = t0

Expression Generation

def generar_expr(self, expr):
    if isinstance(expr, NumeroLiteral):
        return expr.valor  # Just return the number
Example:
42returns: 42

Generation Examples

Simple Expression

let x = 5 + 3;

Nested Expression

let y = a + b * c;

Complete Program

let a = 5;
let b = 10;
let c = a + b * 2;
print c;

Optimization Opportunities

The current implementation does not optimize. Here are potential optimizations:
Before:
t0 = 5 + 3
x = t0
After:
x = 8
Evaluate constant expressions at compile-time.
Before:
t0 = a
t1 = t0 + b
After:
t1 = a + b
Replace copy with original variable.
Before:
t0 = a + b
t1 = c * d  // t1 never used
x = t0
After:
t0 = a + b
x = t0
Remove unused computations.
Before:
t0 = a + b
x = t0
t1 = c + d  // Could reuse t0
y = t1
After:
t0 = a + b
x = t0
t0 = c + d  // Reuse t0 (x already assigned)
y = t0
Reduce number of temporaries.

Comparison: AST vs. IR

For expression a + b * c:
ExpresionBinaria(+)
├── izquierda: Identificador('a')
└── derecha:
    └── ExpresionBinaria(*)
        ├── izquierda: Identificador('b')
        └── derecha: Identificador('c')
Characteristics:
  • Tree structure (hierarchical)
  • Implicit evaluation order (post-order traversal)
  • High-level (close to source code)
  • Good for semantic analysis

Use Cases for IR

Code Generation

Each TAC instruction maps directly to assembly:
t0 = b * c  →  MOV AX, b
               IMUL c
               MOV t0, AX

Optimization

Simple structure makes optimization easier:
  • Analyze dependencies
  • Detect patterns
  • Apply transformations

Interpretation

Can execute IR directly:
for instruction in ir_code:
    parse and execute instruction
(Though this compiler interprets AST, not IR)

Multi-Target

Generate different backends from same IR:
  • x86 assembly
  • ARM assembly
  • LLVM IR
  • C code

Implementation Details

class GeneradorIR:
    def __init__(self):
        self.codigo = []         # Instruction list
        self.temp_counter = 0    # Temporary counter
    
    def nueva_temp(self):
        """Generate unique temporary name"""
        temp = f"t{self.temp_counter}"
        self.temp_counter += 1
        return temp
    
    def generar(self, programa):
        """Entry point - process program"""
        for sentencia in programa.sentencias:
            self.generar_sentencia(sentencia)
        return self.codigo
    
    def generar_sentencia(self, sentencia):
        """Process one statement"""
        # Dispatch by type
    
    def generar_expr(self, expr):
        """Process one expression"""
        # Dispatch by type, return result

Performance

Time Complexity

O(n) where n = AST node countEach node visited once in DFS traversal.

Space Complexity

O(n) for instruction listRoughly one instruction per operation in source code.

Source Code Reference

Implementation

File: compfinal.pyLines: 1090-1149Key Class:
  • GeneradorIR - IR generator
Main Methods:
  • generar(programa) - Entry point, returns instruction list
  • generar_sentencia(sentencia) - Process statement
  • generar_expr(expr) - Process expression, return result operand
  • nueva_temp() - Generate unique temporary name
Data Structures:
  • codigo: List[str] - TAC instruction list
  • temp_counter: int - Temporary variable counter

Next Steps

Interpreter

See how the AST is directly executed

Code Generation

See how IR/AST is converted to x86 assembly

API Reference

Detailed API documentation for the GeneradorIR class

Build docs developers (and LLMs) love