Enhancing our Arithmetic Parser

It does not take too much imagination to think of some enhancements to the arithmetic parser we started developing in the last chapter -- subtraction, multiplication, and division for example! Also, once we have additive and multiplicative operations, we really ought to introduce parentheses for grouping, so that we can write (3+6)*7, for example, so as to have the addition performed before the multiplication.

Okay, let's see how this can be implemented in code. Bring up the Arithmetic.freecc file in your editor and make it look like the following:

TOKEN : {
   <PLUS : "+">
   |
   <MINUS : "-">
   |
   <TIMES : "*">
   |
   <DIVIDE : "/">
   |
   <OPEN_PAREN : "(">
   |
   <CLOSE_PAREN : ")">
   |
   <NUMBER :  (["0"-"9"])+ ("."(["0"-"9"])+)?>
}

SKIP : {
  " " | "\t" | "\n" | "\r"
}

void AdditiveExpression() : 
{}
{
    MultiplicativeExpression()
    (
      (<PLUS>|<MINUS>)
      MultiplicativeExpression()
    )*
}

void MultiplicativeExpression() :
{}
{
    (<NUMBER> | ParentheticalExpression())
    (
       (<TIMES>|<DIVIDE>)
       (<NUMBER> | ParentheticalExpression())
    )*
}

void ParentheticalExpression() :
{}
{
    <OPEN_PAREN>
    AdditiveExpression()
    <CLOSE_PAREN>
}  

Now, let's take stock of things. We have introduced a number of new tokens. Five to be exact: MINUS, TIMES, DIVIDE, OPEN_PAREN, and CLOSE_PAREN. We have introduced two new syntactical productions: MultiplicativeExpression and ParentheticalExpression. The AdditiveExpression production is still there, but has been changed somewhat. For one thing, there is now a choice construct, (<PLUS>|<MINUS>) where there was only <PLUS> before. Also, where before, the production referred to <NUMBER> tokens, it now refers to the new MultiplicativeExpression() production.

We will also need to enhance our test harness to handle the more complex AST that this grammar can generate. So, bring up your ArithmeticTest.java and make it look like the following:

import java.io.*;
import java.util.*;

public class ArithmeticTest {
    static public void main(String[] args) throws ParseException {
       ArithmeticParser parser = new ArithmeticParser(new InputStreamReader(System.in));
       parser.AdditiveExpression();
       Node root = parser.rootNode();
       System.out.println("Dumping the AST...");
       Nodes.dump(root, "  ");
       System.out.println("The result is: " + evaluate(root));
    }
    
    static double evaluate(Node node) {
        if (node instanceof NUMBER) {
            return Double.parseDouble(node.toString());
        }
        if (node instanceof ParentheticalExpression) {
            return evaluate(node.getChild(1));
        }
        Iterator<Node> iterator = Nodes.iterator(node);
        double result = evaluate(iterator.next());
        while (iterator.hasNext()) {
            Node operator = iterator.next();
            double nextValue = evaluate(iterator.next());
            if (operator instanceof PLUS) {
                result += nextValue;
            }
            else if (operator instanceof MINUS) {
                result -= nextValue;
            }
            else if (operator instanceof TIMES) {
                result *= nextValue;
            }
            else if (operator instanceof DIVIDE) {
                result /= nextValue;
            }
        }
        return result;
    }
}  

So, what is new here? Well, our main method did not actually change at all. All the changes are in the evaluate method, as it now has to handle more constructs. Nonetheless, it starts out the same as our older version did. If the node passed in is a NUMBER token, it simply passes back the corresponding double value, using a standard Java API to do so. The new production ParentheticalExpression is handled in a typical recursive way: it gets the expression inside the parentheses and recursively calls the evaluate method on it. Note that this is node.getChild(1), since node.getChild(0) will give us the OPEN_PAREN token and node.getChild(2) will return the CLOSE_PAREN token.

It turns out that AdditiveExpression and MultiplicativeExpression, which are the two remaining node types to handle, do not really have to be handled separately. In either case, we simply iterate over the child nodes, applying the operation corresponding to the intervening operator tokens -- addition, subtraction, multiplication or division. Note that in the line that evaluates the nextValue variable, there is no need to use iterator.hasNext() to check whether there is another Node. You should be able to convince yourself quite easily that, given the grammar that the input necessarily follows (if it didn't, an exception would have been thrown earlier) there must be at least one more Node at this point.

Okay, let's take it for a spin. We build the example the same as last time. Generate the java source with:

freecc Arithmetic.freecc

and compile it all using:

javac *.java

Now,let's try it out on a some arithmetic expressions. Try, say:

echo 2*3 + 4/5 | java Arithmetic

and now, trying out parentheses:

echo (2*3 + 4)/5 | java Arithmetic

Note that in a Unix-derived shell (whether on an actual *nix system or a unix-style shell like cygwin bash on Windows) for reasons that don't seem worth getting into here (even if it is, ironically enough, related to parsing) the arithmetic expression above must be written in quotes, i.e.

echo "(2*3+4)/5" | java Arithmetic

Come to think of it, any input piped in this way should be quoted in a unix-derived shell. Not only are the parentheses characters a problem unquoted, but there is also a tendency for a * character to be replaced with a list of all the files in the current directory. (For example, 2*3 works unquoted but not 2 * 3).


Page generated: 2009-01-13 05:28:31 GMT -- Status: very early draft, all feedback welcome