Skip to main content

Overview

The Expresiones language grammar is defined in Expresiones.g using ANTLR4 syntax. This grammar specifies both the syntactic structure (parser rules) and lexical tokens (lexer rules) of the language.

Grammar File Structure

The grammar file is organized into two main sections:
  1. Syntactic Rules - Define the structure of valid programs
  2. Lexical Rules - Define tokens (keywords, operators, identifiers)

Root Rule

Every Expresiones program starts with the root rule:
root : PROGRAMA LLAVE_IZQ instrucciones+ LLAVE_DER EOF # Prog ;
This requires:
  • The keyword program
  • Opening brace {
  • One or more instructions
  • Closing brace }
  • End of file
The # Prog label creates a specific visitor method visitProg() that you can override in your visitor implementation.

Syntactic Rules

Instructions

The instrucciones rule defines three types of statements:
instrucciones
    : declaracion PUNTO_COMA                 #InstrDecl
    | asignacion PUNTO_COMA                  #InstrAsig
    | SI PAR_IZQ condicion PAR_DER bloque (SINO bloque)? #InstrIf
    ;
Each alternative has a label (#InstrDecl, #InstrAsig, #InstrIf) that generates separate visitor methods.

Declarations

declaracion : TIPO ID (ASIGNACION expr)? ;
Supports:
  • Type specification (int, float, bool)
  • Variable name
  • Optional initialization with an expression
Example:
int x;
float y = 3.14;

Assignments

asignacion : ID ASIGNACION expr ;
Simple assignment of an expression to an existing variable.

Expressions

Expressions support arithmetic operations with proper precedence:
expr: expr (MULT | DIV) expr                #Aritmetica
    | expr (SUMA | RESTA) expr              #Aritmetica
    | NUMERO                                #Numero
    | ID                                    #Variable
    | PAR_IZQ expr PAR_DER                  #ParentesisExpr
    ;
The order of alternatives in expr determines operator precedence. Multiplication and division bind tighter than addition and subtraction due to appearing first.

Conditions

Boolean expressions for control flow:
condicion
    : condicion O_LOGICO condicion          #Logica
    | condicion Y_LOGICO condicion          #Logica
    | NO_LOGICO condicion                   #NotLogica
    | expr op=(MAYOR | MENOR | IGUAL | MAYOR_IGUAL | MENOR_IGUAL | DIFERENTE) expr #Relacional
    | PAR_IZQ condicion PAR_DER             #ParentesisCond
    ;
Supports:
  • Logical operators: && (AND), || (OR), ! (NOT)
  • Comparison operators: >, <, ==, !=, <>, >=, <=
  • Parenthesized conditions

Lexical Rules

Keywords

PROGRAMA : 'program' ;
SI       : 'if' ;
SINO     : 'else' ;
TIPO     : 'int' | 'float' | 'bool' ;

Operators

Arithmetic:
SUMA  : '+' ;
RESTA : '-' ;
MULT  : '*' ;
DIV   : '/' ;
Comparison:
MAYOR       : '>' ;
MENOR       : '<' ;
IGUAL       : '==' ;
DIFERENTE   : '!=' | '<>' ;
MAYOR_IGUAL : '>=' ;
MENOR_IGUAL : '<=' ;
Logical:
Y_LOGICO  : '&&' ;
O_LOGICO  : '||' ;
NO_LOGICO : '!' ;

Identifiers and Literals

ID     : [a-zA-Z][a-zA-Z0-9]* ;
NUMERO : [0-9]+ ('.' [0-9]+)? ;
  • ID: Must start with a letter, followed by letters or digits
  • NUMERO: Integers or decimals (3.14, 42)

Whitespace and Comments

WS     : [ \t\r\n]+ -> skip ;
COMENTARIO : '//' ~[\n\r]* -> skip ;
Spaces, tabs, newlines, and single-line comments are ignored.

Modifying the Grammar

1
Step 1: Add Lexical Tokens
2
If adding new keywords or operators, define them in the lexical section:
3
WHILE : 'while' ;
DO    : 'do' ;
4
Step 2: Update Syntactic Rules
5
Add new rule alternatives or create new rules:
6
instrucciones
    : declaracion PUNTO_COMA                 #InstrDecl
    | asignacion PUNTO_COMA                  #InstrAsig
    | SI PAR_IZQ condicion PAR_DER bloque (SINO bloque)? #InstrIf
    | WHILE PAR_IZQ condicion PAR_DER bloque #InstrWhile
    ;
7
Step 3: Regenerate Parser
8
After modifying the grammar, regenerate the parser and lexer:
9
antlr4 -Dlanguage=Python3 -visitor Expresiones.g
10
This creates:
11
  • ExpresionesLexer.py
  • ExpresionesParser.py
  • ExpresionesVisitor.py
  • 12
    Step 4: Update Visitor
    13
    Implement visitor methods for new labeled rules (see Extending the Compiler).

    Common Grammar Patterns

    Adding a New Operator

    1. Define the token:
      MODULO : '%' ;
      
    2. Add to expression rule:
      expr: expr (MULT | DIV | MODULO) expr    #Aritmetica
          | expr (SUMA | RESTA) expr          #Aritmetica
          ...
      
    3. Handle in visitor:
      if op == ExpresionesParser.MODULO: return izq % der
      

    Adding a New Statement Type

    1. Define required tokens:
      PRINT : 'print' ;
      
    2. Create the rule:
      instrucciones
          : ...
          | PRINT PAR_IZQ expr PAR_DER PUNTO_COMA #InstrPrint
          ;
      
    3. Implement visitInstrPrint() in your visitor

    Adding a New Data Type

    TIPO : 'int' | 'float' | 'bool' | 'string' ;
    
    You’ll also need to add string literals to the lexical rules and update your visitor’s type checking logic.

    Best Practices

    1. Use Labels: Always add #Label to rule alternatives to generate specific visitor methods
    2. Order Matters: Put more specific tokens before general ones to avoid matching conflicts
    3. Left Recursion: ANTLR4 handles left recursion automatically - use it for left-associative operators
    4. Test Incrementally: After each grammar change, test with simple programs before adding complexity
    Always regenerate the parser after modifying the grammar. The old parser files won’t reflect your changes.

    Grammar Debugging Tips

    • Use grun: ANTLR provides a test rig to visualize parse trees
    • Check Ambiguity: ANTLR will warn about ambiguous rules
    • Validate Tokens: Ensure lexer tokens don’t overlap unexpectedly
    • Test Edge Cases: Empty programs, nested structures, operator precedence

    Next Steps

    Build docs developers (and LLMs) love