Expression evaluation in Java (2)

Previously we considered our options when evaluating expressions in Java language. Let’s recap them quickly:

  • Evaluate string expression in one go. Find a library or use built-in ScriptEngine (my favourite choice for most cases). Various universal expression language libraries may be more useful if you use beans and/or call Java methods in the expressions.
  • Parse the string into some kind of AST tree (or ParseTree in ANTLR4).
  • Use the tree as you wish. Interpret the expression, generate different representations of the same expression, etc.

In the first post we covered the first option. Now we’re going to check the latter.

Parser with interpreter

As I mentioned there are many parsers out there, libraries, blog posts, etc. I’ll use ANTLR4 to implement my grammar and expression evaluator. ANTLR seems to be the state of the art technology and ANTLR v4 brings grammar definition and its separation from actual applications even further. There is also a great book – The Definitive ANTLR 4 Reference (written by Terence Parr, ANTLR author) – which I highly recommend if you have even semi-serious ANTLR application on your mind.

Why do we need grammar and some custom code instead of simple expression evaluator? Because we want not only to evaluate the expression, but also transform it and build other representations from it. Our expressions are kinda SQL WHERE clauses (after WHERE) and we want to generate Querydsl Expressions. With expression evaluation we can check whether some object meets some criteria, with Querydsl query we can find all such objects in a database table – and both is important for us. But for now we will focus just on evaluation.

Couple of other disclaimers:

  • I’ll not explain ANTLR grammar syntax beyond the most basic points or dig deep into details of ANTLR that are covered in many other tutorials – or in the aforementioned book.
  • I’ll nor cover any build of the project, but of course there are plugins how to generate sources from ANTLR grammar. I’ll just mention command-line and IDEA way to do it.

Our first grammar will be very simple, we will support couple of operations, integer literals and parenthesis, but no variables – just to warm up. The whole example is on github.

Let there be a grammar!

ANTLR grammar file specifies grammatical rules and lexical rules. Lexical rules recognize tokens, grammatical then build upon these tokens. Our simple grammar file looks like this:

/*
Use IDEA with ANTLR v4 plugin to generate classes:
- first set-up (right-click on the file, Configure ANTLR...)
- setup output directory to match this modules src/main/java
- set package for generated classes: expr1.grammar
- check "generate parse tree visitor", uncheck listener above (not necessary)

Command line when using downloaded "Complete ANTLR 4.5.1 Java binaries jar" (version may vary):
- know where the JAR is downloaded
- run the command in the expr1/grammar package directory (where Expr.g4 is)
- java -jar ~/Downloads/antlr-4.5.1-complete.jar -o . -package expr1.grammar -visitor -no-listener Expr.g4
- -o . means this directory (can be omitted), in IDEA it somehow adds package to the output directory
  but this command does not do that, it merely sets package in sources

*.tokens files are NOT necessary for runtime, they are not added to version control
*/
grammar Expr;

expr: expr op=(OP_MUL | OP_DIV) expr # arithmetic
    | expr op=(OP_ADD | OP_SUB) expr # arithmetic
    | INT # int
    | '(' expr ')' # parens
    ;

OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';

INT: [0-9]+;
WS: [ \t\r\n]+ -> skip;

Lexical rules start with uppercase, grammatical ones in lowercase. Later we will be able to hook onto the grammatical rules, or even to their alternatives as in this case. Alternatives are marked by # followed by alternative name and we will be able to write code for them separately. There is much more about this in the book I mentioned.

ANTLR4 grammars are super intuitive because just mentioning OP_MUL/DIV before OP_ADD/SUB we defined the operator precedence. Still we want them all process in the same method, hence the same alternative name. For now just remember to recall these ideas when you see our Visitor implementation. 🙂

Generating helper classes

We need to ask ANTLR to generate some support classes for us, so we can build our application with them. Comment at the beginning is for me and my colleagues so we know how to do that. Of course, this can be part of the build and classes would go somewhere to target/generated-sources. Please, generate these classes, we will need them in the next steps.

ANTLR allows us to separate grammar from the concrete application nicely. When we look at it we see inherent limitations (like no way to enter floating-point numbers), but it doesn’t say whether number will be Integer or BigInteger. It also doesn’t say whether * will really be multiplication, although we would be shooting ourselves in the foot repurposing this operator. But it can be done. Grammar (how it looks) and application (what it does) are cleanly separated.

Parse tree

ANTLR offers two way how to work with parsed “code” based on our grammar. You can implement ParseTreeListener that is event-based or ParseTreeVisitor where the flow control during processing is more explicit. If you know SAX and StAX (XML push parser vs pull parser) it is something similar. In any case we need ParseTree first, so let’s get it:

public static ParseTree createParseTree(String expression) {
    ANTLRInputStream input = new ANTLRInputStream(expression);
    ExprLexer lexer = new ExprLexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    ExprParser parser = new ExprParser(tokens);
    return parser.expr();
}
...
ParseTree parseTree = createParseTree("5+3");

On the second line we use our lexer (tokenizer if you will) that ANLTR generated for us from the grammar. We feed it with CharStream – implemented by ANTLRInputStream. Next two lines we do the same thing with tokenized input utilizing generated class ExprParser. We call expr() method on it – this matches the first rule from our grammar. This returns generated implementation of a ParseTree, but we will stick with the most abstract interface as we don’t need anything specific.

Visitor calculating the expression

On the next line we parse some expression and now… we need to implement the application logic. In our case we will use Visitor. ANTLR generated ExprVisitor interface for us, but even better, it also provides skeleton implementation ExprBaseVisitor that we can extend without the need to implement everything. Let’s extend it and call it ExpressionCalculatorVisitor.

package expr1;

import expr1.grammar.ExprBaseVisitor;
import expr1.grammar.ExprParser;

public class ExpressionCalculatorVisitor extends ExprBaseVisitor<Integer> {

    @Override
    public Integer visitInt(ExprParser.IntContext ctx) {
        return Integer.valueOf(ctx.INT().getText());
    }

    @Override
    public Integer visitParens(ExprParser.ParensContext ctx) {
        return visit(ctx.expr());
    }

    @Override
    public Integer visitArithmetic(ExprParser.ArithmeticContext ctx) {
        int left = visit(ctx.expr(0));
        int right = visit(ctx.expr(1));

        switch (ctx.op.getType()) {
            case ExprParser.OP_ADD:
                return left + right;
            case ExprParser.OP_SUB:
                return left - right;
            case ExprParser.OP_MUL:
                return left * right;
            case ExprParser.OP_DIV:
                return left / right;
            default:
                throw new IllegalStateException("Unknown operator " + ctx.op);
        }
    }
}

Our visitor returns Integer, because it can work with a single type anyway. Anything more complex would probably fall back to Object, or used some value wrapper. This example should return 8 for our 5+3 expression. But you never know until you test it, right?

Writing test

Testing expression calculator was super easy and lightweight compared to other tests in our project, many of them starting the whole Spring context. I’ll use TestNG, but JUnit will work in the same way. Here is our Expr1Test:

package expr1;

import static org.testng.Assert.assertEquals;

import org.antlr.v4.runtime.tree.ParseTree;
import org.testng.annotations.Test;

public class Expr1Test {

	@Test
	public void integerLiteralStaysInteger() {
		assertEquals(expr("5"), 5);
	}

	@Test
	public void plusAddsNumbers() {
		assertEquals(expr("5+4"), 9);
	}

	@Test
	public void minusCanProduceNegativeNumber() {
		assertEquals(expr("5-8"), -3);
	}

	@Test
	public void whitespacesAreIgnored() {
		assertEquals(expr("5\n- 8\t + 3"), 0);
	}

	@Test
	public void multiplyPrecedesPlus() {
		assertEquals(expr("2+3*3"), 11);
		assertEquals(expr("2*3+3"), 9);
	}

	@Test
	public void parenthesisChangePrecedence() {
		assertEquals(expr("(2+3)*3"), 15);
		assertEquals(expr("2*(3+3)"), 12);
	}

	@Test(expectedExceptions = ExpressionException.class,
		expectedExceptionsMessageRegExp = "no viable alternative at input '-'")
	public void cannotEnterNegativeNumber() {
		expr("-1");
	}

	private Object expr(String expression) {
		ParseTree parseTree = ExpressionUtils.createParseTree(expression);
		return new ExpressionCalculatorVisitor()
			.visit(parseTree);
	}
}

I used shortcut method called expr to parse and evaluate my expression – and just compared the results. But with the code so far there is one test failing – cannotEnterNegativeNumber.

Custom error handling

Brush aside the silliness of the idea that we can’t enter negative number (especially when it can be valid result :-)), the problem here is that by default parser reports the error but does not throw any exception. Let’s add this feature now – here is complete listing of ExpressionUtils:

package expr1;

import expr1.grammar.ExprLexer;
import expr1.grammar.ExprParser;
import org.antlr.v4.runtime.ANTLRErrorListener;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;
import org.antlr.v4.runtime.tree.ParseTree;

public class ExpressionUtils {

	private static final ANTLRErrorListener ERROR_LISTENER = new ExceptionThrowingErrorListener();

	public static ParseTree createParseTree(String expression) {
		ANTLRInputStream input = new ANTLRInputStream(expression);
		ExprLexer lexer = new ExprLexer(input);
		CommonTokenStream tokens = new CommonTokenStream(lexer);
		ExprParser parser = new ExprParser(tokens);
		parser.removeErrorListeners();
		parser.addErrorListener(ERROR_LISTENER);
		return parser.expr();
	}

	public static class ExceptionThrowingErrorListener extends BaseErrorListener {
		@Override
		public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol,
			int line, int charPositionInLine, String msg, RecognitionException e)
		{
			throw new ExpressionException(msg);
		}
	}
}

What we did here is that we removed the original listener (that would still print the error on stderr otherwise) and added listener that throws exception with the same message instead. Exception implementation is on github and isn’t worth the listing.

Conclusion

Of course, we merely scratched the surface of ANTLR and custom language implementations, but we managed to do so much with so little code! You saw rules and their alternatives in action and how they manifested in the generated visitor class.

Now you probably can imagine how to extend the idea – and we will do exactly this in my next post. We will add variables, because without them expression evaluation isn’t that useful for most customization purposes. We will also add more types and logical operations. So it will be fun, I promise.

Advertisements

About virgo47
Java Developer by profession in the first place. Gamer and amateur musician. And father too. Naive believer in brighter future. Step by step.

2 Responses to Expression evaluation in Java (2)

  1. Pingback: Expression evaluation in Java (3) | Virgo's Naive Stories

  2. Pingback: Expression evaluation in Java (4) | Virgo's Naive Stories

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s