Skip to main content

Grammar Overview

The Mini-Compilador Educativo implements a simple imperative language with variable declarations, arithmetic expressions, and print statements. The grammar is designed for educational clarity while demonstrating core compilation concepts.

EBNF Specification

The following grammar is extracted from the source code (compfinal.py and readme.md) and defines the complete syntax accepted by the compiler.
programa    ::= sentencia*
sentencia   ::= declaracion | print_stmt
declaracion ::= "let" IDENTIFICADOR "=" expresion ";"
print_stmt  ::= "print" expresion ";"
expresion   ::= suma_resta
suma_resta  ::= mult_div ( ("+" | "-") mult_div )*
mult_div    ::= primario ( ("*" | "/") primario )*
primario    ::= NUMERO | IDENTIFICADOR | "(" expresion ")"

Grammar Notation

  • ::= - “is defined as”
  • | - alternation (or)
  • * - zero or more repetitions
  • + - one or more repetitions (not used in this grammar)
  • ? - optional (zero or one)
  • () - grouping
  • "..." - terminal symbol (literal keyword or operator)
  • UPPERCASE - token type (terminal from lexer)

Token Types

The lexer recognizes the following token categories:

Keywords (Reserved Words)

let

Declares a new variableType: TipoToken.LET
Usage: let x = 10;

print

Outputs expression valueType: TipoToken.PRINT
Usage: print x + 5;

leo

Reserved word (no operation)Type: TipoToken.LEO
Note: Recognized but ignored

diego

Reserved word (no operation)Type: TipoToken.DIEGO
Note: Recognized but ignored

Literals

Token Type: TipoToken.NUMERO
Pattern: [0-9]+
Value Type: Integer
# Valid numbers
0
42
1234567890
The language does NOT support:
  • Negative literals (use subtraction: 0 - 5)
  • Floating-point numbers
  • Hexadecimal/octal/binary notation

Operators

+ (Addition)

Type: TipoToken.SUMA
Precedence: 1 (lowest)
Associativity: Left-to-right

- (Subtraction)

Type: TipoToken.RESTA
Precedence: 1 (lowest)
Associativity: Left-to-right

* (Multiplication)

Type: TipoToken.MULTIPLICACION
Precedence: 2 (higher)
Associativity: Left-to-right

/ (Division)

Type: TipoToken.DIVISION
Precedence: 2 (higher)
Associativity: Left-to-right
Note: Integer division (floor)

= (Assignment)

Type: TipoToken.IGUAL
Context: Only in let statements
Not: Used for comparison

Delimiters

SymbolToken TypePurposeExample
(PAREN_IZQGroup expressions(5 + 3)
)PAREN_DERClose grouping(5 + 3) * 2
;PUNTO_COMATerminate statementslet x = 5;

Special Tokens

  • FIN_ARCHIVO - Marks end of input
  • ERROR - Invalid character encountered

Grammar Rules Explained

Program Structure

programa ::= sentencia*
A program consists of zero or more statements. There is no main function or program wrapper.
let a = 5;
let b = 10;
print a + b;

Statements

sentencia ::= declaracion | print_stmt
Two types of statements are supported:
declaracion ::= "let" IDENTIFICADOR "=" expresion ";"
Declares a new variable and assigns it a value.Examples:
let x = 10;
let sum = x + 5;
let result = (a + b) * 2;
Semantics:
  • Creates new variable in current scope
  • Evaluates expression on right-hand side
  • Assigns result to variable
  • Variables can be redeclared (warning issued)

Expressions

expresion   ::= suma_resta
suma_resta  ::= mult_div ( ("+" | "-") mult_div )*
mult_div    ::= primario ( ("*" | "/") primario )*
primario    ::= NUMERO | IDENTIFICADOR | "(" expresion ")"
The expression grammar implements operator precedence through grammar nesting:
1

Primary Expressions (Highest Precedence)

primario ::= NUMERO | IDENTIFICADOR | "(" expresion ")"
Base values:
  • Numeric literals: 42
  • Variable references: x
  • Parenthesized expressions: (5 + 3)
2

Multiplicative Operators

mult_div ::= primario ( ("*" | "/") primario )*
Multiplication and division bind tighter than addition/subtraction.Example: 2 + 3 * 4 parses as 2 + (3 * 4) = 14
3

Additive Operators (Lowest Precedence)

suma_resta ::= mult_div ( ("+" | "-") mult_div )*
Addition and subtraction evaluated last.Example: 10 - 2 * 3 parses as 10 - (2 * 3) = 4

Operator Precedence Table

PrecedenceOperatorsAssociativityExampleResult
1 (highest)()-(5 + 3) * 216
2* /Left-to-right10 / 2 / 51
3 (lowest)+ -Left-to-right10 - 3 + 29
All binary operators are left-associative, meaning a op b op c is evaluated as (a op b) op c.Example: 10 - 3 - 2 = (10 - 3) - 2 = 5

Syntax Examples

Valid Programs

let a = 5;
let b = 10;
let c = a + b;
print c;  // Outputs: 15

Common Syntax Errors

let x = 5  // ❌ Error: expected ';'
Fix:
let x = 5;  // ✓
x = 10;  // ❌ Error: expected 'let' or 'print'
Fix:
let x = 10;  // ✓
let x 10;  // ❌ Error: expected '=' after variable name
Fix:
let x = 10;  // ✓
let x = (5 + 3;  // ❌ Error: expected ')'
Fix:
let x = (5 + 3);  // ✓
let let = 5;  // ❌ Error: 'let' is a reserved word
let print = 10;  // ❌ Error: 'print' is a reserved word
Fix:
let myLet = 5;  // ✓
let printValue = 10;  // ✓

Comments

// This is a single-line comment
let x = 5;  // Comments can appear after code
Comment Syntax: // begins a comment that extends to end of lineImplementation: Comments are handled in the lexer. When // is detected, the scanner advances to the next newline without generating tokens.Source: compfinal.py:319-327

Semantic Constraints

Beyond syntax, the semantic analyzer enforces:

Variable Declaration

Variables must be declared with let before use.
print x;  // ❌ Error: variable 'x' not declared
let x = 5;
print x;  // ✓

Division by Zero

Literal division by zero is caught at compile time.
let x = 10 / 0;  // ❌ Error: division by zero
Only literal zeros are detected. Runtime division by zero (e.g., 10 / x where x is 0) is not checked.

Grammar Design Rationale

The grammar is intentionally minimal to focus on core compilation concepts:
  • No control flow (if/while/for)
  • No functions or procedures
  • No data types beyond integers
  • No arrays or data structures
This allows students to understand the full pipeline without getting lost in language complexity.

Parse Tree Example

For the input: let x = 5 + 3 * 2;
sentencia
└── declaracion
    ├── "let"
    ├── IDENTIFICADOR("x")
    ├── "="
    ├── expresion
    │   └── suma_resta
    │       ├── mult_div
    │       │   └── primario
    │       │       └── NUMERO(5)
    │       ├── "+"
    │       └── mult_div
    │           ├── primario
    │           │   └── NUMERO(3)
    │           ├── "*"
    │           └── primario
    │               └── NUMERO(2)
    └── ";"
This parse tree correctly represents 5 + (3 * 2) due to precedence rules.
The grammar successfully handles operator precedence and associativity through its structure, eliminating the need for a separate precedence-climbing algorithm.

Build docs developers (and LLMs) love