Quantcast
Channel: inovex GmbH
Viewing all articles
Browse latest Browse all 59

Building a simple calculator with ANTLR in Python

$
0
0

Domain-specific languages (DSLs) are individual computer languages designed to address specific problems within a particular domain, improving productivity and expressiveness. By providing a syntax and semantics that closely align with the needs of users in that domain, DSLs simplify complex tasks and improve code readability. They allow for more efficient communication between developers and domain experts, ultimately leading to better software solutions. To facilitate the creation of DSLs, tools like ANTLR (ANother Tool for Language Recognition) play a crucial role.

ANTLR is a powerful parser generator (i.e., a tool that supports you creating parsers) used for reading, processing, and executing structured text, binary text or binary files. It allows developers to define the grammar of a language and automatically generates code for parsing that language. ANTLR supports a variety of programming languages, including Java, JavaScript, C#, Python, and more, allowing developers to generate parsers in their language of choice.

In this blog post, we will demonstrate how to build a simple calculator using ANTLR in the Python programming language, focusing on the listener and visitor design patterns.

Motivation

Using ANTLR to build a parser offers several significant advantages over writing a parser from scratch:

  • Ease of Use: ANTLR provides a high-level way to define grammars, allowing developers to focus on the language’s structure rather than the complexities of parsing algorithms.
  • Code Generation: ANTLR automatically generates the lexer and parser code from the specified grammar, saving time and reducing the potential for errors that can occur in manual implementations.
  • Robustness: ANTLR is designed to handle complex grammars and can manage ambiguities and errors easily, ensuring that your parser is both reliable and efficient.
  • Multi-Language Support: With support for various programming languages, ANTLR allows you to create parsers in the language that best suits your project needs.
  • Community and Documentation: ANTLR has a strong community and extensive documentation, making it easier to find resources and examples that can help you along the way.

Example demo: Simple calculator using listener and visitor

In the following, we will demonstrate the steps necessary to build a simple calculator using ANTLR in Python. You can find the demo code in the following Git Repository.

During the demonstration of the steps, we will also explain key concepts of ANTLR like grammar, lexer, and parser, which are responsible for defining the language structure, breaking the input text into tokens, and organizing the tokens into a parse tree. Since we are implementing the calculator by following the ANTLR listener and visitor patterns which are design approaches for processing the parse tree generated by a grammar, we will also explain in a later section the concepts behind listeners and visitors (and the steps necessary to realize the patterns).

ANTLR and Python Setup

After cloning the project code from the repository, you can just follow the instructions in the README file of the project to set up ANTLR and the Python environment. The only requirements for the setup are that Python3 and Java are installed on your machine.

Grammar

Grammar defines the rules and structure of a language. In ANTLR, you specify the syntax of the language you want to parse using a formal grammar notation. The ANTLR grammar shares some foundational concepts with the Backus-Naur Form (BNF) which is a notation for context-free grammars to describe the syntax of Domain-Specific Languages (DSL). But the ANTLR grammar has a more expressive and user-friendly syntax, allowing for a clearer definition of tokens (the smallest unit of meaning, also called lexer rules) and parser rules (in contrast, BNF typically does not distinguish between lexical and syntactic elements).

The rules are defined in a g4 file, with the name of the grammar at the beginning, followed by parser rules defined in lowercase letters, followed by lexer rules (tokens) at the end, defined in uppercase letters.

Below you can see the calculator grammar derived and simplified from the following grammar.

The grammar consists of a single parser rule called expression, which describes how expressions can be formed:

  1. An expression can be a single NUMBER.
  2. An expression can be wrapped in parentheses, allowing for the grouping of expressions.
  3. An expression can be formed by multiplying two sub-expressions.
  4. An expression can be formed by dividing one sub-expression by another.
  5. An expression can be formed by adding two sub-expressions.
  6. An expression can be formed by subtracting one sub-expression from another.

The grammar also defines several tokens that are used in the expressions:

  • Operators:
    • PLUS: Represents the addition operator (+).
    • MINUS: Represents the subtraction operator (-).
    • TIMES: Represents the multiplication operator (*).
    • DIV: Represents the division operator (/).
  • Number:
    • NUMBER: Represents a sequence of digits (one or more digits)
  • Whitespace:
    • WS: Matches whitespace characters (spaces, newlines, tabs) and is skipped during parsing.

grammar Calculator;

expression 
	: NUMBER						# Number
	| '(' expression ')'			# Parentheses
	| expression TIMES expression	# Multiplication
	| expression DIV expression		# Division
	| expression PLUS expression	# Addition
	| expression MINUS expression	# Subtraction
;

PLUS : '+';
MINUS: '-';
TIMES: '*';
DIV  : '/';
NUMBER : [0-9]+;
WS : [ \r\n\t]+ -> skip;

Lexer and parser

Using the defined grammar file, you can create lexer and parser classes, with ANTLR using the following command. The command will also generate listener and visitor classes which we will use later for the evaluation of the expressions.

As you can see in the command below, we are specifying the language as Python to create the classes in the Python programming language. It is also necessary to pass the visitor argument, since the default execution without the argument does not generate the visitor class automatically. Finally, we also need to specify the grammar file to be considered during the generation of the classes.

All the generated classes have the prefix with the name of the grammar (Calculator*).

antlr4 -Dlanguage=Python3 -visitor antlr/Calculator.g4

The lexer, or lexical analyzer, is responsible for breaking the input text into a stream of tokens based on the grammar rules. It identifies keywords, operators, identifiers, and other relevant symbols, allowing the parser to understand the input at a higher level.

The parser takes the stream of tokens produced by the lexer and organizes them into a parse tree according to the grammar rules. This structure represents the hierarchical relationships between the tokens and is essential for evaluating expressions or executing commands.

You can see in the image below an example of how the lexer breaks the input text into tokens and how the parser organizes them into a parse tree.

Parse Tree

In ANTLR, you can also visualize the parse tree for an input text.

To visualize the parse tree, we are using the Python library antlr4-tools. After installing the library using pip, the parse tree can be visualized with the following command. Here, as an example, the expression 3+5*(2-8) was considered.

antlr4-parse antlr/Calculator.g4 expression -gui
3+5*(2-8)

Listener and visitor

In ANTLR, there are two primary methods for handling the results of parsing: listeners and visitors. Both approaches facilitate the traversal of the parse tree generated from the input, but they serve different purposes and offer distinct functionalities.

Why use listeners and visitors?

As shown previously, when you successfully parse an input, the result is a parse tree that starts with a root node and branches out to multiple child nodes. Each node in the tree has a single parent, except for the root node, which has none. Traversing this tree can be accomplished using a depth-first search approach:

  1. Start with the root node as your current node.
  2. If the node has no children, backtrack to the parent node.
  3. If there are unvisited child nodes, select the first one and repeat the process.

While this traversal method is straightforward, ANTLR simplifies the process by automatically generating listeners and visitors for you.

Similarities between listeners and visitors:

  • Both utilize a depth-first traversal algorithm.
  • They provide a standardized way to process a parse tree, eliminating the need to devise your own traversal strategy.
  • By offering a consistent approach, listeners and visitors enhance the reusability of your grammar across different programs and languages. You can apply the same grammar to various listeners and visitors to address different problems, even in different programming languages.
  • This standardization saves time and ensures that anyone working with the parse tree can easily understand and utilize the provided methods.

Differences between listeners and visitors:

While both listeners and visitors serve the purpose of traversing the parse tree, they differ in functionality:

Listeners:

  • Automatically invoke functions when the walker encounters a node in the tree.
  • Do not allow for control over the traversal process; the order of traversal is predefined.
  • Recommended when you do not need to alter the structure of the parse tree to achieve your goals. They are ideal for scenarios where the output can be generated without modifying the tree.

Visitors:

  • Provide more control over the traversal process, allowing you to dictate the order and manner in which nodes are visited.
  • Enable the return of values from each visitor function, which can be useful for extracting information or performing calculations based on multiple nodes.
  • Best suited for situations where you need to return results from each node or require more complex interactions with the parse tree.

By understanding the roles of listeners and visitors, you can choose the appropriate method based on your specific needs when working with ANTLR-generated parse trees.

In the following sections, we will compare listeners and visitors by implementing a calculator using both approaches.

Expression evaluation using listener

As shown in the code below, the provided listeners class from ANTLR contains a pair of enter/exit methods for each parser rule. Thus, a listener includes enter and exit methods that will be executed when the listener enters and exits a node, respectively. These methods accept as arguments the same Context object. For example, enterMultiplication and exitMultiplication methods both accept a MultiplicationContext object as an argument.

# Generated from antlr/Calculator.g4 by ANTLR 4.13.1
from antlr4 import *
if "." in __name__:
    from .CalculatorParser import CalculatorParser
else:
    from CalculatorParser import CalculatorParser

# This class defines a complete listener for a parse tree produced by CalculatorParser.
class CalculatorListener(ParseTreeListener):

    # Enter a parse tree produced by CalculatorParser#Multiplication.
    def enterMultiplication(self, ctx:CalculatorParser.MultiplicationContext):
        pass

    # Exit a parse tree produced by CalculatorParser#Multiplication.
    def exitMultiplication(self, ctx:CalculatorParser.MultiplicationContext):
        pass

    # Enter a parse tree produced by CalculatorParser#Addition.
    def enterAddition(self, ctx:CalculatorParser.AdditionContext):
        pass

    # Exit a parse tree produced by CalculatorParser#Addition.
    def exitAddition(self, ctx:CalculatorParser.AdditionContext):
        pass

    # Enter a parse tree produced by CalculatorParser#Subtraction.
    def enterSubtraction(self, ctx:CalculatorParser.SubtractionContext):
        pass

    # Exit a parse tree produced by CalculatorParser#Subtraction.
    def exitSubtraction(self, ctx:CalculatorParser.SubtractionContext):
        pass

    # Enter a parse tree produced by CalculatorParser#Number.
    def enterNumber(self, ctx:CalculatorParser.NumberContext):
        pass

    # Exit a parse tree produced by CalculatorParser#Number.
    def exitNumber(self, ctx:CalculatorParser.NumberContext):
        pass

    # Enter a parse tree produced by CalculatorParser#Division.
    def enterDivision(self, ctx:CalculatorParser.DivisionContext):
        pass

    # Exit a parse tree produced by CalculatorParser#Division.
    def exitDivision(self, ctx:CalculatorParser.DivisionContext):
        pass

    # Enter a parse tree produced by CalculatorParser#Parentheses.
    def enterParentheses(self, ctx:CalculatorParser.ParenthesesContext):
        pass

    # Exit a parse tree produced by CalculatorParser#Parentheses.
    def exitParentheses(self, ctx:CalculatorParser.ParenthesesContext):
        pass

del CalculatorParser

To realize the calculation evaluation of an expression with a listener, it is first necessary to create a class which extends the default listener class.

As represented in the code below, we are naming the extended class EvaluationListener. In the extended class, first a stack list is considered for storing and extracting the results for each step.

Then, we are overwriting the exit methods provided by the default listener class as follows:

  • If a number node is exited, then the number context is appended to the stack.
  • For the multiplication/division/addition/substraction, first the right and left expressions are extracted from the stack. Then the expressions are multiplicated/divided/added/subtracted together, and the result is appended to the stack.

The EvaluationListener class also implements its own method (getResult) to extract the final evaluation result from the stack.

from antlr.CalculatorLexer import CalculatorLexer
from antlr.CalculatorParser import CalculatorParser
from antlr.CalculatorListener import CalculatorListener
from antlr4 import *

class EvaluationListener(CalculatorListener):
    def __init__(self):
        self.stack = []

    def exitNumber(self, ctx: CalculatorParser.NumberContext):
        # Push the number onto the stack
        number = int(ctx.getText())
        self.stack.append(number)

    def exitParentheses(self, ctx: CalculatorParser.ParenthesesContext):
        # Evaluate the expression inside the parentheses
        pass  # No action needed; the result will be handled in the parent operation

    def exitMultiplication(self, ctx: CalculatorParser.MultiplicationContext):
        right = self.stack.pop()
        left = self.stack.pop()
        result = left * right
        self.stack.append(result)

    def exitDivision(self, ctx: CalculatorParser.DivisionContext):
        right = self.stack.pop()
        left = self.stack.pop()
        result = left / right
        self.stack.append(result)

    def exitAddition(self, ctx: CalculatorParser.AdditionContext):
        right = self.stack.pop()
        left = self.stack.pop()
        result = left + right
        self.stack.append(result)

    def exitSubtraction(self, ctx: CalculatorParser.SubtractionContext):
        right = self.stack.pop()
        left = self.stack.pop()
        result = left - right
        self.stack.append(result)

    def getResult(self):
        # The final result will be the last item on the stack
        return self.stack.pop() if self.stack else None

Now, to use the implemented EvaluationListener, the following scripts can be executed.

For the execution of the script, an expression needs to be passed. Using the passed expression, first, a parse tree object is generated (by parsing the expression into an input stream, creating a CalculatorLexer object from the input stream, creating a CommonTokenStream object from the lexer object, creating a CalculatorParser object from the token stream, and generating a parse tree from the parser).

Then, an object for the implemented listener needs to be created. After that, a walker object can be created using the ParseTreeWalker class from the antlr4 library.

We can use the walker to traverse the parse tree with the implemented listener using the walk method. Finally, we can get from the listener the evaluation result and print the result for the user.

import sys
from antlr4 import *
from antlr.CalculatorLexer import CalculatorLexer
from antlr.CalculatorParser import CalculatorParser
from custom.EvaluationListener import EvaluationListener

def main(argv):
    input_expr = "".join(argv)
    input_stream = InputStream(input_expr)
    
    lexer = CalculatorLexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = CalculatorParser(stream)
    
    # Start parsing at the 'expression' rule
    tree = parser.expression()
    
    # Create a listener and walk the tree
    listener = EvaluationListener()
    walker = ParseTreeWalker()
    walker.walk(listener, tree)
    
    # Get the result from the listener
    result = listener.getResult()
    
    print(f"The result of '{input_expr}' is: {result}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python calculate_with_listener.py ")
        sys.exit(1)
    main(sys.argv[1:])

Below, you can see an example for the execution of the script with the expression: 3+5*(2-8).

The script evaluates the expression with the implemented listener and prints the correct calculation result.

$ python3 calculate_with_listener.py
The result of '3+5*(2-8)' is: -27

Expression evaluation using visitor

In the code below, you can see the default visitor class generated by ANTLR using the defined calculator grammar file. The visitor class contains, in contrast to the listener class, only one visit method for each node. The input arguments for the methods are similarly structured as the listener methods.

# Generated from antlr/Calculator.g4 by ANTLR 4.13.1
from antlr4 import *
if "." in __name__:
    from .CalculatorParser import CalculatorParser
else:
    from CalculatorParser import CalculatorParser

# This class defines a complete generic visitor for a parse tree produced by CalculatorParser.

class CalculatorVisitor(ParseTreeVisitor):

    # Visit a parse tree produced by CalculatorParser#Multiplication.
    def visitMultiplication(self, ctx:CalculatorParser.MultiplicationContext):
        return self.visitChildren(ctx)

    # Visit a parse tree produced by CalculatorParser#Addition.
    def visitAddition(self, ctx:CalculatorParser.AdditionContext):
        return self.visitChildren(ctx)

    # Visit a parse tree produced by CalculatorParser#Subtraction.
    def visitSubtraction(self, ctx:CalculatorParser.SubtractionContext):
        return self.visitChildren(ctx)

    # Visit a parse tree produced by CalculatorParser#Number.
    def visitNumber(self, ctx:CalculatorParser.NumberContext):
        return self.visitChildren(ctx)

    # Visit a parse tree produced by CalculatorParser#Division.
    def visitDivision(self, ctx:CalculatorParser.DivisionContext):
        return self.visitChildren(ctx)

    # Visit a parse tree produced by CalculatorParser#Parentheses.
    def visitParentheses(self, ctx:CalculatorParser.ParenthesesContext):
        return self.visitChildren(ctx)

del CalculatorParser

Now, to realize the calculations function with the visitor class, we first need to create a class which extends the default visitor class, similarly to the approach with the listener.

Then, we can again overwrite the methods like we did in the listener approach. But the difference here is that it is not necessary to provide a stack list and an additional method to return the final result. As mentioned in the previous section, with visitors, it is possible to define the return value for each visit method.

In the extended visitor class, we are overwriting the visit methods as follows:

  • If the number node is visited, just return the number context at the end of the method.
  • If the parentheses node is visited, just return the expression of the input context to evaluate the expression inside the parentheses. By calling the self.visit method of the ParseTreeVisitor class (which is the parent class of the CalculatorVisitor class), the expression can be evaluated recursively (here the expression inside the parentheses can be evaluated).
  • For multiplication/division/addition/subtraction, first extract the evaluation result of each of the left and right expressions by calling again the visit method of the ParseTreeVisitor class on the left and right expressions, respectively. After extracting the left and right evaluation results, these can be multiplied/divided/added/subtracted together and can be returned as the result of the visit method.

from antlr.CalculatorLexer import CalculatorLexer
from antlr.CalculatorParser import CalculatorParser
from antlr.CalculatorVisitor import CalculatorVisitor
from antlr4 import *

class EvaluationVisitor(CalculatorVisitor):
    def visitNumber(self, ctx: CalculatorParser.NumberContext):
        # Convert the NUMBER token to an integer
        return int(ctx.getText())

    def visitParentheses(self, ctx: CalculatorParser.ParenthesesContext):
        # Evaluate the expression inside the parentheses
        return self.visit(ctx.expression())

    def visitMultiplication(self, ctx: CalculatorParser.MultiplicationContext):
        # Evaluate left and right expressions and multiply them
        left = self.visit(ctx.expression(0))
        right = self.visit(ctx.expression(1))
        return left * right

    def visitDivision(self, ctx: CalculatorParser.DivisionContext):
        # Evaluate left and right expressions and divide them
        left = self.visit(ctx.expression(0))
        right = self.visit(ctx.expression(1))
        return left / right

    def visitAddition(self, ctx: CalculatorParser.AdditionContext):
        # Evaluate left and right expressions and add them
        left = self.visit(ctx.expression(0))
        right = self.visit(ctx.expression(1))
        return left + right

    def visitSubtraction(self, ctx: CalculatorParser.SubtractionContext):
        # Evaluate left and right expressions and subtract them
        left = self.visit(ctx.expression(0))
        right = self.visit(ctx.expression(1))
        return left - right

Now, to execute the calculation with the implemented EvaluationVisitor class, the following scripts can be used.

Similarly to the script for the listener, an expression needs to be passed. The steps for generating a parse tree are again similar to the script for the listener approach. But for the visitor approach, it is not necessary to use a walker object to get the final evaluation result of the input expression. Instead, the visit method of the EvaluationVisitor object can be executed directly on the tree to get the final calculation result.

import sys
from antlr4 import *
from antlr.CalculatorLexer import CalculatorLexer
from antlr.CalculatorParser import CalculatorParser
from custom.EvaluationVisitor import EvaluationVisitor

def main(argv):
    input_expr = "".join(argv)
    input_stream = InputStream(input_expr)
    
    lexer = CalculatorLexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = CalculatorParser(stream)
    
    # Start parsing at the 'expression' rule
    tree = parser.expression()
    
    # Create a visitor and evaluate the expression
    evaluator = EvaluationVisitor()
    result = evaluator.visit(tree)
    
    print(f"The result of '{input_expr}' is: {result}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python calculate_with_visitor.py ")
        sys.exit(1)
    main(sys.argv[1:])

Below, you can see an example for the execution of the script with the expression: 3+5*(2-8).

The script evaluates the expression using the implemented visitor and prints the correct calculation result.

$ python3 calculate_with_visitor.py
The result of '3+5*(2-8)' is: -27

Conclusion

In this blog post, we demonstrated the most important concepts behind ANTLR by presenting the implementation of a simple calculator in Python. ANTLR allows the developer to specify the syntax of a language you want to parse using a formal grammar notation, and for the evaluation of the input expression, listener and visitor classes provided by ANTLR can be used. Listeners and visitors provide a standard way to traverse the parse tree and developers can select one of the approaches by considering the differences between these concepts.

For simplicity, we considered the calculator example in this blog post, but with ANTLR you can also easily implement other use cases, like compiler creation, custom domain-specific languages, or data transformation.

References and useful links


Viewing all articles
Browse latest Browse all 59