Expression evaluation in Java (4)

Previously we looked at various options how to evaluate expressions, then we implemented our own evaluator in ANTLR v4 and then we complicated it a bit with more types and more operations. But it still doesn’t make sense without variables. What good would any expression based rule engine be if we can’t change input parameters? 🙂

But before that…

All the code related to this blog post is here on GitHub, package name is called expr3 this time as it is our third iteration of the ANTLR solution (even though we are in the 4th post). Tests are here. We will focus on variables, but there are some changes made to expr3 beyond variables themselves.

  • Grammar supports NULL literal and identifiers (for variables) – but we will get to this.
  • ExpressionCalculatorVisitor now accepts even more types and method convertToSupportedType is taking care of this. This is important to support reasonable palette of object types for variables (and will work the same for return values from functions later).
  • Numbers can be represented as Integer or BigDecimal. If the number fits into Integer range (and is not decimal number, of course) it will be represented with Integer, otherwise BigDecimal is used. This does complicate arithmetic and relational (comparisons) operations, we need some promotions here and there, etc. As of now it is coded in a bit crude way – had the implicit conversion rules been more complicated, some more sophisticated solution would be better.
  • Java Date Time API types can be used as variable values, but these will be converted to the ISO extended strings (which still allows comparison!). LocalDate, LocalDateTime and Instant are supported in this demo.

As I said, I’ll not focus on these changes, they are in the code and while they affect how we treat variables, they are not inherently related to introducing them. I’ll also not talk about related tests (like literal resolution into Integer vs BigDecimal) – again, it is in the repo.

Identifiers and null

When we’re working with variables, we need to write them somehow into the expression – and that’s where identifiers come in. As you’d expect, identifiers represent the variables on their respective place in the expression (or rather their value), so they are one kind of elemental expressions, just like various literals. Second thing we may need is NULL value. This is not strictly necessary in all contexts, but null is so common in our Java world that I decided to support it too. Our expr node in its fullness looks like this:

expr: STRING_LITERAL # stringLiteral
    | BOOLEAN_LITERAL # booleanLiteral
    | NUMERIC_LITERAL # numericLiteral
    | NULL_LITERAL # nullLiteral
    | op=('-' | '+') expr # unarySign
    | expr op=(OP_MUL | OP_DIV | OP_MOD) expr # arithmeticOp
    | expr op=(OP_ADD | OP_SUB) expr # arithmeticOp
    | expr op=(OP_LT | OP_GT | OP_EQ | OP_NE | OP_LE | OP_GE) expr # comparisonOp
    | OP_NOT expr # logicNot
    | expr op=(OP_AND | OP_OR) expr # logicOp
    | ID # variable
    | '(' expr ')' # parens
    ;

Null literal is predictably primitive:

NULL_LITERAL : N U L L;

Identifiers are not very complicated either, and I guess they are pretty much similar to Java syntax:

ID: [a-zA-Z$_][a-zA-Z0-9$_.]*;

Various tests for null in the expression (without variables first) may look like this:

    @Test
    public void nullComparison() {
        assertEquals(expr("null == null"), true);
        assertEquals(expr("null != null"), false);
        assertEquals(expr("5 != null"), true);
        assertEquals(expr("5 == null"), false);
        assertEquals(expr("null != 5"), true);
        assertEquals(expr("null == 5"), false);
        assertEquals(expr("null > null"), false);
        assertEquals(expr("null < null"), false);
        assertEquals(expr("null <= null"), false); assertEquals(expr("null >= null"), false);
    }

We don’t need much for this – resolving the NULL literal is particularly simple:

@Override
public Object visitNullLiteral(NullLiteralContext ctx) {
    return null;
}

We also modified visitComparisonOp – now it starts like this:

@Override
public Boolean visitComparisonOp(ExprParser.ComparisonOpContext ctx) {
    Comparable left = (Comparable) visit(ctx.expr(0));
    Comparable right = (Comparable) visit(ctx.expr(1));
    int operator = ctx.op.getType();
    if (left == null || right == null) {
        return left == null && right == null && operator == OP_EQ
            || (left != null || right != null) && operator == OP_NE;
    }
...

The rest is dealing with non-null values, etc. We may also let this method return null when null is involved anywhere except for EQ/NE, now it returns false. Depends on what logic we want.

Variable resolver

Variable resolving inside the calculator class is also quite simple. We need something that resolves them – that’s that variableResolver field initialized in the constructor and used in visitVariable:

private final ExpressionVariableResolver variableResolver;

public ExpressionCalculatorVisitor(ExpressionVariableResolver variableResolver) {
    if (variableResolver == null) {
        throw new IllegalArgumentException("Variable resolver must be provided");
    }
    this.variableResolver = variableResolver;
}

@Override
public Object visitVariable(VariableContext ctx) {
    Object value = variableResolver.resolve(ctx.ID().getText());
    return convertToSupportedType(value);
}

Anything this resolver returns is converted to supported types as mentioned in the introduction. ExpressionVariableResolver is again very simple:

public interface ExpressionVariableResolver {
    Object resolve(String variableName);
}

And how we can implement this? In Java 8 you must just love it – here is piece of test:

private ExpressionVariableResolver variableResolver;

@BeforeMethod
public void init() {
    variableResolver = var -> null;
}

@Test
public void primitiveVariableResolverReturnsTheSameValueForAnyVarName() {
    variableResolver = var -> 5;
    assertEquals(expr("var"), 5);
    assertEquals(expr("anyvarworksnow"), 5);
}

I use field that is set in init method to default “implementation” that always return null. In another test method I change it and it always return 5, regardless of actual variable name (parameter var) as this test clearly demonstrates. Next test is more useful, because this resolver returns value only for specific value:

@Test
public void variableResolverReturnsValueForOneVarName() {
    variableResolver = var -> var.equals("var") ? 5 : null;
    assertEquals(expr("var"), 5);
    assertEquals(expr("var != null"), true);
    assertEquals(expr("var == null"), false);

    assertEquals(expr("anyvarworksnow"), null);
    assertEquals(expr("anyvarworksnow == null"), true);
}

Now the actual name of the variable (identifier) must be “var”, otherwise it returns null again. You might have heard that lambdas may work as super-short test implementations – and yes, they can.

You may wonder why I have the field instead of using it only in the test method itself. This would be better contained and preventing any accidental leaks (although that @BeforeMethod covered me on this). Trouble is that variableResolver is used deeper in expr(…) method and I didn’t want to add it as parameter everywhere, hence the field:

private Object expr(String expression) {
    ParseTree parseTree = ExpressionUtils.createParseTree(expression);
    return new ExpressionCalculatorVisitor(variableResolver)
        .visit(parseTree);
}

Any real-life implementation?

Variable resolvers in the test were obviously very primitive, so let’s try something more realistic. First try is also very simple, but indeed realistic. Remember Bindings in Java’s ScriptEngine? It actually extends Map – so how about resolver that wraps existing Map<String, Object> (mapping variable name to it’s value)? Ok, it – again – may be too primitive:

ExpressionVariableResolver resolver = var -> map.get(var);

Bah, we need some bigger challenge! Let’s say I have a Java Bean or any POJO and I want to explicitly specify my variable names and how they should be resolved with that object. This may be call of method, like getter, so right now we don’t have the values readily available in a collection (or a map).

Important thing to realize here is that resolver will be different from an object to object, because for different objects it needs to provide different values. However, the way how it obtains the values will be the same. And we will wrap this “way” into VariableMapper that knows how to get values from an object of specific type (using generics) – and it will also help us to resolve the value for specific instance. Tests show how I intend to use it:

private VariableMapper<SomeBean> variableMapper;
private ParseTree myNameExpression;
private ParseTree myCountExpression;

@BeforeClass
public void init() {
    variableMapper = new VariableMapper<SomeBean>()
        .set("myName", o -> o.name)
        .set("myCount", SomeBean::getCount);
    myNameExpression = ExpressionUtils.createParseTree("myName <= 'Virgo'"); myCountExpression = ExpressionUtils.createParseTree("myCount * 3"); } @Test public void myNameExpressionTest() { SomeBean bean = new SomeBean(); ExpressionCalculatorVisitor visitor = new ExpressionCalculatorVisitor( var -> variableMapper.resolveVariable(var, bean));

    assertEquals(visitor.visit(myNameExpression), false); // null comparison is false
    bean.name = "Virgo";
    assertEquals(visitor.visit(myNameExpression), true);
    bean.name = "ABBA";
    assertEquals(visitor.visit(myNameExpression), true);
    bean.name = "Virgo47";
    assertEquals(visitor.visit(myNameExpression), false);
}

@Test
public void myCountExpressionTest() {
    SomeBean bean = new SomeBean();
    ExpressionCalculatorVisitor visitor = new ExpressionCalculatorVisitor(
        var -> variableMapper.resolveVariable(var, bean));

// assertEquals(visitor.visit(myCountExpression), false); // NPE!
    bean.setCount(3f);
    assertEquals(visitor.visit(myCountExpression), new BigDecimal("9"));
    bean.setCount(-1.1f);
    assertEquals(visitor.visit(myCountExpression), new BigDecimal("-3.3"));
}

public static class SomeBean {
    public String name;
    private Float count;

    public Float getCount() {
        return count;
    }

    public void setCount(Float count) {
        this.count = count;
    }
}

VariableMapper can live longer, you set it up and then reuse it. Its configuration is its state (methods set), concrete object is merely input parameter. Variable resolver itself works like a closure around the concrete instance. Keep in mind that instantiating calculator visitor is cheap and visiting itself is something you have to do anyway. Creating parsing tree is expensive, but we don’t repeat this between tests – and that’s probably how you want to use it in your application too. Cache the parse trees, create visitors – even with state specific for a single calculation – and then throw them away. This is also safest from threading perspective. You don’t want to use calculator that “closes over” another target object in another thread in the middle of your visiting business. 🙂

Ok, how does that VariableMapper look like?

public class VariableMapper<T> {
    private Map<String, Function<T, Object>> variableValueFunctions = new HashMap<>();

    public VariableMapper<T> set(String variableName, Function<T, Object> valueFunction) {
        variableValueFunctions.put(variableName, valueFunction);
        return this;
    }

    public Object resolveVariable(String variableName, T object) {
        Function<T, Object> valueFunction = variableValueFunctions.get(variableName);
        if (valueFunction == null) {
            throw new ExpressionException("Unknown variable " + variableName);
        }
        return valueFunction.apply(object);
    }
}

As said, it keeps the configuration, but not the state of the object used in concrete calculation – that’s what the variable resolver does (and again, using lambda, one simply can’t resist in this case). Sure, you can combine VariableResolver with the mapping configuration too, but that will either 1) work in a single-threaded environment only, 2) or you have to reconfigure that mapping for each resolver in each thread. It simply doesn’t make sense. Mapper (long-lived) keeps the “way” how to get stuff from an object of some type in a particular computation context while variable resolver (short-lived) merely closes over the concrete instance.

Of course, our mapper can stand some improvements, it would be good if one could “seal” the configuration and no more “set” calls are allowed after that (probably throwing IllegalStateException).

Conclusion

So here we are, supporting even more types (Integer/BigDecimal), but – most importantly – variables! As you can see, now every computation can bring different result. That’s why it’s advisable to rethink how you want to instantiate your visitors, especially in case of multi-threaded environment.

Our ExpressionVariableResolver interface is very simple, it supports only variable name – so if you want to resolve from something stateful (and probably mutable) it’s important to wrap around it somehow. Variable resolver doesn’t know how to get stuff from an object, because there is no such input parameter. That’s why we introduced VariableMapper that supports getting values from an object of some type (generic). And we “implement” variable resolver as lambda to close over the configured variable mapper and an object that is then fed to its resolveVariable method. This method, in contrast to variable resolver’s resolve, takes in the object as a parameter.

It doesn’t have to be an object – you may implement other ways to get variable values in different contexts, you just have to wrap around that context (in our case object) somehow. I dare to say that Java 8 functional programming capabilities make it so much easier…

Still, the main hero here is ANTLR v4, of course. Now our expression evaluator truly makes sense. I’m not promising any continuation of this series, but maybe I’ll talk about functions too. Although I guess you can easily implement them yourselves by now.

Expression evaluation in Java (3)

In the first part of the series we evaluated expressions to get the result, in the second part we parsed the expression with our own grammar. Today we will expand on the topic, we will introduce more types and more operations. Variables are left for the next installment as this part will be long enough without them anyway.

As always – complete example is available on Github and its structure is pretty much similar to the first simple version of our ANTLR based expression evaluator. And what will we do today?

  • We will expand our numbers from integers to floating point (implemented with BigDecimals).
  • We will add strings and booleans types too, along with relational operators (comparing).
  • Relation operators can be written as >= but also as GE which is extremely handy when expressions are part of XML configuration. In addition to this, keywords will be case insensitive.

We will start introducing the grammar changes from the end of the file to the beginning – you can download our new grammar in its wholeness.

Case-insensitive keywords

It would be cool if there was some easy way how to say to ANTLR that some particular keyword (string literal in the grammar) is case insensitive – as there are many languages that use this way. But there no magical switch, so we have to help ourselves with tiny parts of token rules called fragments. Following part will go to the very end of the grammar file.

fragment DIGIT : [0-9];

fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

Now when we want case-insensitive keyword we will write O R instead of ‘or’ like this:

OP_OR: O R | '||';

But we will get to operators later, let’s do the literals and company first.

Literals and whitespaces

We will support boolean literals as a word and a letter (blame my recent experience with R language). Number literals must cover floating point numbers too. And finally there are strings here as well. White spaces are without any change at all:

BOOLEAN_LITERAL: T R U E | T
    | F A L S E | F
    ;

NUMERIC_LITERAL : DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
    | '.' DIGIT+ ( E [-+]? DIGIT+ )?
    ;

STRING_LITERAL : '\'' ( ~'\'' | '\'\'' )* '\'';

WS: [ \t\r\n]+ -> skip;

Operators

Nothing special here, we just add more of them.

OP_LT: L T | '<'; OP_GT: G T | '>';
OP_LE: L E | '<='; OP_GE: G E | '>=';
OP_EQ: E Q | '=' '='?;
OP_NE: N E | N E Q | '!=' | '<>';
OP_AND: A N D | '&&';
OP_OR: O R | '||';
OP_NOT: N O T | '!';
OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';
OP_MOD: '%';

Equality can be written both as in SQL or as in Java, because there is no assignment statement in our grammar. All relational and logical operators have both the symbols and keywords. If you miss XOR you can add it yourself, of course.

And the expression rule…

Finally we got to the expression rule, which got a bit richer:

result: expr;

expr: STRING_LITERAL # stringLiteral
    | BOOLEAN_LITERAL # booleanLiteral
    | NUMERIC_LITERAL # numericLiteral
    | op=('-' | '+') expr # unarySign
    | expr op=(OP_MUL | OP_DIV | OP_MOD) expr # arithmeticOp
    | expr op=(OP_ADD | OP_SUB) expr # arithmeticOp
    | expr op=(OP_LT | OP_GT | OP_EQ | OP_NE | OP_LE | OP_GE) expr # comparisonOp
    | OP_NOT expr # logicNot
    | expr op=(OP_AND | OP_OR) expr # logicOp
    | '(' expr ')' # parens
    ;

You may notice one additional rule – result. We will use this for a single special reason covered later. Now is the time to generate the classes from the grammar and then to reimplement the visitor class.

BTW: Our new version of the calculator visitor will not extend ExprBaseVisitor<Integer> with parameter (Integer) anymore as the result may be of any supported type – String, BigDecimal or Boolean. And we don’t know what will be the result. So we just don’t state the parameterized type at all.

Implementing literals

Before we implement, we should have some expectations – and that means tests. My test class is not perfect one case per test, but it does the job.

    @Test
    public void booleanLiterals() {
        assertEquals(expr("t"), true);
        assertEquals(expr("True"), true);
        assertEquals(expr("f"), false);
        assertEquals(expr("faLSE"), false);
    }

    @Test
    public void stringLiteral() {
        assertEquals(expr("''"), "");
        assertEquals(expr("''''"), "'");
        assertEquals(expr("'something'"), "something");
    }

    @Test
    public void numberLiterals() {
        assertEquals(expr("5"), new BigDecimal("5"));
        assertEquals(expr("10.35"), new BigDecimal("10.35"));
    }

Method expr in test is still implemented like before. Let’s focus on the visitor implementation now. The parts we need for this test to work are here:

    @Override
    public String visitStringLiteral(StringLiteralContext ctx) {
        String text = ctx.STRING_LITERAL().getText();
        text = text.substring(1, text.length() - 1)
            .replaceAll("''", "'");
        return text;
    }

    @Override
    public Boolean visitBooleanLiteral(BooleanLiteralContext ctx) {
        return ctx.BOOLEAN_LITERAL().getText().toLowerCase().charAt(0) == 't';
    }

    @Override
    public BigDecimal visitNumericLiteral(NumericLiteralContext ctx) {
        String text = ctx.NUMERIC_LITERAL().getText();
        return stringToNumber(text);
    }

    private BigDecimal stringToNumber(String text) {
        BigDecimal bigDecimal = new BigDecimal(text);

        return bigDecimal.scale() < 0
            ? bigDecimal.setScale(0, roundingMode)
            : bigDecimal;
    }

Arithmetic operations

Now we will implement arithmetics with BigDecimals, including unary signs and parentheses.

    @Test
    public void arithmeticTest() {
        assertEquals(expr("5+5.1"), new BigDecimal("10.1"));
        assertEquals(expr("5-5.1"), new BigDecimal("-0.1"));
        assertEquals(expr("0.3*0.1"), new BigDecimal("0.03"));
        assertEquals(expr("0.33/0.1"), new BigDecimal("3.3"));
        assertEquals(expr("1/3"), new BigDecimal("0.333333333333333"));
        assertEquals(expr("10%3"), new BigDecimal("1"));
    }

    @Test
    public void unarySignTest() {
        assertEquals(expr("-5"), new BigDecimal("-5"));
        assertEquals(expr("-+-5"), new BigDecimal("5"));
        assertEquals(expr("-(3+5)"), new BigDecimal("-8"));
    }

The implementation in the Visitor respects our defined maximal scale, which is very important for cases like 1/3. Without scale and rounding set this would throw an error.

    public static final int DEFAULT_MAX_SCALE = 15;
    private int maxScale = DEFAULT_MAX_SCALE;
    private int roundingMode = BigDecimal.ROUND_HALF_UP;

    @Override
    public BigDecimal visitArithmeticOp(ExprParser.ArithmeticOpContext ctx) {
        BigDecimal left = (BigDecimal) visit(ctx.expr(0));
        BigDecimal right = (BigDecimal) visit(ctx.expr(1));
        return bigDecimalArithmetic(ctx, left, right);
    }

    private BigDecimal bigDecimalArithmetic(ExprParser.ArithmeticOpContext ctx, BigDecimal left, BigDecimal right) {
        switch (ctx.op.getType()) {
            case OP_ADD:
                return left.add(right);
            case OP_SUB:
                return left.subtract(right);
            case OP_MUL:
                return left.multiply(right);
            case OP_DIV:
                return left.divide(right, maxScale, roundingMode).stripTrailingZeros();
            case OP_MOD:
                return left.remainder(right);
            default:
                throw new IllegalStateException("Unknown operator " + ctx.op);
        }
    }

    @Override
    public BigDecimal visitUnarySign(UnarySignContext ctx) {
        BigDecimal result = (BigDecimal) visit(ctx.expr());
        boolean unaryMinus = ctx.op.getText().equals("-");
        return unaryMinus
            ? result.negate()
            : result;
    }

    @Override
    public Object visitParens(ParensContext ctx) {
        return visit(ctx.expr());
    }

We may appreciate scale 15 for interim results, but maybe we want lower scale for final result. Let’s do that now:

Result rule with different scale

Let’s say we require just scale of 6 for final results. We will implement result rule for that:

    public static final int DEFAULT_MAX_RESULT_SCALE = 6;
    private int maxResultScale = DEFAULT_MAX_RESULT_SCALE;

    /** Maximum BigDecimal scale used during computations. */
    public ExpressionCalculatorVisitor maxScale(int maxScale) {
        this.maxScale = maxScale;
        return this;
    }

    /** Maximum BigDecimal scale for result. */
    public ExpressionCalculatorVisitor maxResultScale(int maxResultScale) {
        this.maxResultScale = maxResultScale;
        return this;
    }

    @Override
    public Object visitResult(ExprParser.ResultContext ctx) {
        Object result = visit(ctx.expr());
        if (result instanceof BigDecimal) {
            BigDecimal bdResult = (BigDecimal) result;
            if (bdResult.scale() > maxResultScale) {
                result = bdResult.setScale(maxResultScale, roundingMode);
            }
        }
        return result;
    }

We also introduced two methods setting the maximal interim and result scale. Both of them return this, so they can be used right after constructor of the ExpressionCalculatorVisitor. One last change is required in ExpressionUtils – instead of calling expr rule on parser we need to use result rule. Otherwise the util class looks exactly like before.

Of course we have to fix the test and for 1/3 expect just 0.333333 instead of 15 digits like before.

Logical operations

After previous problems this is a piece of cake. First the test:

    @Test
    public void logicalOperatorTest() {
        assertEquals(expr("F && F"), false);
        assertEquals(expr("F && T"), false);
        assertEquals(expr("T and F"), false);
        assertEquals(expr("T AND T"), true);
        assertEquals(expr("F || F"), false);
        assertEquals(expr("F || T"), true);
        assertEquals(expr("T or F"), true);
        assertEquals(expr("T OR T"), true);
        assertEquals(expr("!T"), false);
        assertEquals(expr("not T"), false);
        assertEquals(expr("!f"), true);
    }

And the visitor implementation:

    @Override
    public Boolean visitLogicOp(ExprParser.LogicOpContext ctx) {
        boolean left = (boolean) visit(ctx.expr(0));

        switch (ctx.op.getType()) {
            case OP_AND:
                return left && booleanRightSide(ctx);
            case OP_OR:
                return left || booleanRightSide(ctx);
            default:
                throw new IllegalStateException("Unknown operator " + ctx.op);
        }
    }

    private boolean booleanRightSide(ExprParser.LogicOpContext ctx) {
        return (boolean) visit(ctx.expr(1));
    }

    @Override
    public Boolean visitLogicNot(ExprParser.LogicNotContext ctx) {
        return !(Boolean) visit(ctx.expr());
    }

Relational operators

Let’s do some comparisons and (non) equalities. Test first:

    @Test
    public void relationalOperatorTest() {
        assertEquals(expr("1 > 0.5"), true);
        assertEquals(expr("1 > 1"), false);
        assertEquals(expr("1 >= 0.5"), true);
        assertEquals(expr("1 >= 1"), true);
        assertEquals(expr("5 == 5"), true);
        assertEquals(expr("5 != 5"), false);
        assertEquals(expr("'a' > 'b'"), false);
        assertEquals(expr("'a' >= 'b'"), false);
        assertEquals(expr("'a' < 'b'"), true);
        assertEquals(expr("'a' <= 'b'"), true);
        assertEquals(expr("true == true"), true);
        assertEquals(expr("true == f"), false);
        assertEquals(expr("true eq t"), true);
    }

Again, I realize that these should be many separate tests and the coverage of cases is also not that great, but let’s move on to the implementation:

    @Override
    public Boolean visitComparisonOp(ExprParser.ComparisonOpContext ctx) {
        Comparable left = (Comparable) visit(ctx.expr(0));
        Comparable right = (Comparable) visit(ctx.expr(1));
        int operator = ctx.op.getType();
        if (left == null || right == null) {
            return left == null && right == null && operator == OP_EQ
                || (left != null || right != null) && operator == OP_NE;
        }

        //noinspection unchecked
        int comp = left.compareTo(right);
        switch (operator) {
            case OP_EQ:
                return comp == 0;
            case OP_NE:
                return comp != 0;
            case OP_GT:
                return comp > 0;
            case OP_LT:
                return comp < 0; case OP_GE: return comp >= 0;
            case OP_LE:
                return comp <= 0;
            default:
                throw new IllegalStateException("Unknown operator " + ctx.op);
        }
    }

Nothing special here – we utilize the fact that Strings, Booleans and BigDecimals are all comparable.

Conclusion

What did we see and what more can we do? For the sheer scope of the changes we postponed the variables for the next part of the series. Of course, variables (and functions!) are very important for expression evaluation, otherwise we’re talking about constant result for each expression anyway, right?

We have created flexible grammar that is kinda mix between SQL and Java. Of course, you may not like too many options for all the operators, but in our context it doesn’t pose any problem. It might have – in the context of general programming language (e.g. = vs == operators).

Another thing is that numbers are BigDecimals – always. This is kinda Groovy-like, where 3 / 2 is not 1 as Java programmer would expect and this division is also much slower than integer division. We may introduce integers and support smart division based on the context (like Java does after all, except that FP is IEEE 754 and not BigDecimal-based). But not now.

Now you can download the complete example – and if you don’t know how to download part of the repo, you may try this command:

svn export https://github.com/virgo47/litterbin/trunk/demos/antlr-expr/src/main/java/expr2/

Tests are in src/test of course. See you next with variables and maybe even functions.

Expression evaluation in Java (2)

Previously we considered our options when evaluating expressions in Java language. Let’s recap them quickly:

  • Evaluate string expression in one go. Find a library or use built-in ScriptEngine (my favourite choice for most cases). Various universal expression language libraries may be more useful if you use beans and/or call Java methods in the expressions.
  • Parse the string into some kind of AST tree (or ParseTree in ANTLR4).
  • Use the tree as you wish. Interpret the expression, generate different representations of the same expression, etc.

In the first post we covered the first option. Now we’re going to check the latter.

Parser with interpreter

As I mentioned there are many parsers out there, libraries, blog posts, etc. I’ll use ANTLR4 to implement my grammar and expression evaluator. ANTLR seems to be the state of the art technology and ANTLR v4 brings grammar definition and its separation from actual applications even further. There is also a great book – The Definitive ANTLR 4 Reference (written by Terence Parr, ANTLR author) – which I highly recommend if you have even semi-serious ANTLR application on your mind.

Why do we need grammar and some custom code instead of simple expression evaluator? Because we want not only to evaluate the expression, but also transform it and build other representations from it. Our expressions are kinda SQL WHERE clauses (after WHERE) and we want to generate Querydsl Expressions. With expression evaluation we can check whether some object meets some criteria, with Querydsl query we can find all such objects in a database table – and both is important for us. But for now we will focus just on evaluation.

Couple of other disclaimers:

  • I’ll not explain ANTLR grammar syntax beyond the most basic points or dig deep into details of ANTLR that are covered in many other tutorials – or in the aforementioned book.
  • I’ll nor cover any build of the project, but of course there are plugins how to generate sources from ANTLR grammar. I’ll just mention command-line and IDEA way to do it.

Our first grammar will be very simple, we will support couple of operations, integer literals and parenthesis, but no variables – just to warm up. The whole example is on github.

Let there be a grammar!

ANTLR grammar file specifies grammatical rules and lexical rules. Lexical rules recognize tokens, grammatical then build upon these tokens. Our simple grammar file looks like this:

/*
Use IDEA with ANTLR v4 plugin to generate classes:
- first set-up (right-click on the file, Configure ANTLR...)
- setup output directory to match this modules src/main/java
- set package for generated classes: expr1.grammar
- check "generate parse tree visitor", uncheck listener above (not necessary)

Command line when using downloaded "Complete ANTLR 4.5.1 Java binaries jar" (version may vary):
- know where the JAR is downloaded
- run the command in the expr1/grammar package directory (where Expr.g4 is)
- java -jar ~/Downloads/antlr-4.5.1-complete.jar -o . -package expr1.grammar -visitor -no-listener Expr.g4
- -o . means this directory (can be omitted), in IDEA it somehow adds package to the output directory
  but this command does not do that, it merely sets package in sources

*.tokens files are NOT necessary for runtime, they are not added to version control
*/
grammar Expr;

expr: expr op=(OP_MUL | OP_DIV) expr # arithmetic
    | expr op=(OP_ADD | OP_SUB) expr # arithmetic
    | INT # int
    | '(' expr ')' # parens
    ;

OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';

INT: [0-9]+;
WS: [ \t\r\n]+ -> skip;

Lexical rules start with uppercase, grammatical ones in lowercase. Later we will be able to hook onto the grammatical rules, or even to their alternatives as in this case. Alternatives are marked by # followed by alternative name and we will be able to write code for them separately. There is much more about this in the book I mentioned.

ANTLR4 grammars are super intuitive because just mentioning OP_MUL/DIV before OP_ADD/SUB we defined the operator precedence. Still we want them all process in the same method, hence the same alternative name. For now just remember to recall these ideas when you see our Visitor implementation. 🙂

Generating helper classes

We need to ask ANTLR to generate some support classes for us, so we can build our application with them. Comment at the beginning is for me and my colleagues so we know how to do that. Of course, this can be part of the build and classes would go somewhere to target/generated-sources. Please, generate these classes, we will need them in the next steps.

ANTLR allows us to separate grammar from the concrete application nicely. When we look at it we see inherent limitations (like no way to enter floating-point numbers), but it doesn’t say whether number will be Integer or BigInteger. It also doesn’t say whether * will really be multiplication, although we would be shooting ourselves in the foot repurposing this operator. But it can be done. Grammar (how it looks) and application (what it does) are cleanly separated.

Parse tree

ANTLR offers two way how to work with parsed “code” based on our grammar. You can implement ParseTreeListener that is event-based or ParseTreeVisitor where the flow control during processing is more explicit. If you know SAX and StAX (XML push parser vs pull parser) it is something similar. In any case we need ParseTree first, so let’s get it:

public static ParseTree createParseTree(String expression) {
    ANTLRInputStream input = new ANTLRInputStream(expression);
    ExprLexer lexer = new ExprLexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    ExprParser parser = new ExprParser(tokens);
    return parser.expr();
}
...
ParseTree parseTree = createParseTree("5+3");

On the second line we use our lexer (tokenizer if you will) that ANLTR generated for us from the grammar. We feed it with CharStream – implemented by ANTLRInputStream. Next two lines we do the same thing with tokenized input utilizing generated class ExprParser. We call expr() method on it – this matches the first rule from our grammar. This returns generated implementation of a ParseTree, but we will stick with the most abstract interface as we don’t need anything specific.

Visitor calculating the expression

On the next line we parse some expression and now… we need to implement the application logic. In our case we will use Visitor. ANTLR generated ExprVisitor interface for us, but even better, it also provides skeleton implementation ExprBaseVisitor that we can extend without the need to implement everything. Let’s extend it and call it ExpressionCalculatorVisitor.

package expr1;

import expr1.grammar.ExprBaseVisitor;
import expr1.grammar.ExprParser;

public class ExpressionCalculatorVisitor extends ExprBaseVisitor<Integer> {

    @Override
    public Integer visitInt(ExprParser.IntContext ctx) {
        return Integer.valueOf(ctx.INT().getText());
    }

    @Override
    public Integer visitParens(ExprParser.ParensContext ctx) {
        return visit(ctx.expr());
    }

    @Override
    public Integer visitArithmetic(ExprParser.ArithmeticContext ctx) {
        int left = visit(ctx.expr(0));
        int right = visit(ctx.expr(1));

        switch (ctx.op.getType()) {
            case ExprParser.OP_ADD:
                return left + right;
            case ExprParser.OP_SUB:
                return left - right;
            case ExprParser.OP_MUL:
                return left * right;
            case ExprParser.OP_DIV:
                return left / right;
            default:
                throw new IllegalStateException("Unknown operator " + ctx.op);
        }
    }
}

Our visitor returns Integer, because it can work with a single type anyway. Anything more complex would probably fall back to Object, or used some value wrapper. This example should return 8 for our 5+3 expression. But you never know until you test it, right?

Writing test

Testing expression calculator was super easy and lightweight compared to other tests in our project, many of them starting the whole Spring context. I’ll use TestNG, but JUnit will work in the same way. Here is our Expr1Test:

package expr1;

import static org.testng.Assert.assertEquals;

import org.antlr.v4.runtime.tree.ParseTree;
import org.testng.annotations.Test;

public class Expr1Test {

	@Test
	public void integerLiteralStaysInteger() {
		assertEquals(expr("5"), 5);
	}

	@Test
	public void plusAddsNumbers() {
		assertEquals(expr("5+4"), 9);
	}

	@Test
	public void minusCanProduceNegativeNumber() {
		assertEquals(expr("5-8"), -3);
	}

	@Test
	public void whitespacesAreIgnored() {
		assertEquals(expr("5\n- 8\t + 3"), 0);
	}

	@Test
	public void multiplyPrecedesPlus() {
		assertEquals(expr("2+3*3"), 11);
		assertEquals(expr("2*3+3"), 9);
	}

	@Test
	public void parenthesisChangePrecedence() {
		assertEquals(expr("(2+3)*3"), 15);
		assertEquals(expr("2*(3+3)"), 12);
	}

	@Test(expectedExceptions = ExpressionException.class,
		expectedExceptionsMessageRegExp = "no viable alternative at input '-'")
	public void cannotEnterNegativeNumber() {
		expr("-1");
	}

	private Object expr(String expression) {
		ParseTree parseTree = ExpressionUtils.createParseTree(expression);
		return new ExpressionCalculatorVisitor()
			.visit(parseTree);
	}
}

I used shortcut method called expr to parse and evaluate my expression – and just compared the results. But with the code so far there is one test failing – cannotEnterNegativeNumber.

Custom error handling

Brush aside the silliness of the idea that we can’t enter negative number (especially when it can be valid result :-)), the problem here is that by default parser reports the error but does not throw any exception. Let’s add this feature now – here is complete listing of ExpressionUtils:

package expr1;

import expr1.grammar.ExprLexer;
import expr1.grammar.ExprParser;
import org.antlr.v4.runtime.ANTLRErrorListener;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;
import org.antlr.v4.runtime.tree.ParseTree;

public class ExpressionUtils {

	private static final ANTLRErrorListener ERROR_LISTENER = new ExceptionThrowingErrorListener();

	public static ParseTree createParseTree(String expression) {
		ANTLRInputStream input = new ANTLRInputStream(expression);
		ExprLexer lexer = new ExprLexer(input);
		CommonTokenStream tokens = new CommonTokenStream(lexer);
		ExprParser parser = new ExprParser(tokens);
		parser.removeErrorListeners();
		parser.addErrorListener(ERROR_LISTENER);
		return parser.expr();
	}

	public static class ExceptionThrowingErrorListener extends BaseErrorListener {
		@Override
		public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol,
			int line, int charPositionInLine, String msg, RecognitionException e)
		{
			throw new ExpressionException(msg);
		}
	}
}

What we did here is that we removed the original listener (that would still print the error on stderr otherwise) and added listener that throws exception with the same message instead. Exception implementation is on github and isn’t worth the listing.

Conclusion

Of course, we merely scratched the surface of ANTLR and custom language implementations, but we managed to do so much with so little code! You saw rules and their alternatives in action and how they manifested in the generated visitor class.

Now you probably can imagine how to extend the idea – and we will do exactly this in my next post. We will add variables, because without them expression evaluation isn’t that useful for most customization purposes. We will also add more types and logical operations. So it will be fun, I promise.

Expression evaluation in Java (1)

There are times when you need to evaluate some expression in your program. Sometimes you can create something that is not text/string based, that is you construct the expression in a different way – typically so it is easier to validate, evaluate, etc. Sometimes, nothing beats the textual representation. It’s easy and quick to write, and I prefer it any time to various click-based solutions (typically based on some tree representation).

But the expression is a tree after all and we need to parse it somehow. Or do we? In my first sentence I said I needed to evaluate the expression. Can we forgo the parsing altogether?

This will be just the first of (probably) two posts and after some general thoughts I will focus on direct string expression evaluation. Parsing and what follows will follow in another post.

Parsing the grammar or what?

If you know tools like ANTLR then you may be asking what am I going to parse here? Is it a grammar written in some meta-language? No – I’m not interested in grammar itself now at all. Grammar is given and I want to parse the expression written in that grammar. If we used ANTLR, we’d be working with classes generated from the grammar.

Also, just to be sure, anytime I say parse/parsing mean it in a broader sense – both lexer and parser process together (except for cases where I explicitly mention tokenization).

Evaluate or parse as well?

Unless you internally represent the expression as a tree already, you will need to parse it from the string/stream – or something has to do it for you. Next question is whether you want to work with the expression tree and interpret it yourself, or you are really interested only in the result of the expression evaluation.

Evaluation in any case means that I can feed the same expression (whatever the representation is) with various inputs – which is valuable when when the expression contains variables. Without variables it doesn’t make any sense anyway. For instance in case of Java’s ScriptEngine, this is what Bindings do.

So what are our options?

  • Expression evaluation from string to result in one go. I’m not interested in a tree, any internals, I just want results. String input may be compiled for efficiency if it is to be evaluated many times.
  • I have tree representation of the particular expression based on well-know types (custom or from some API, doesn’t matter) and I interpret it. This implies that the tree construction is supported by UI, tree is stored in a meaningful way, etc.
    I don’t see any easy way to compile it here, unless we produce string representation first and then follow the previous bullet. I’d go this path only if compilation provided big performance win (after thorough benchmarks).
  • I have string input, but I want to parse it first and then follow previous bullet. There are many parsers out there, many blog posts about the topic, etc. However, I’d just take ANTLR and use that. You generate the classes from a grammar and you’re done, then it’s just pure Java with a bit of antlr-runtime. It’s the state of the art technology, reinventing wheel here is pointless unless you have some extraordinary requirements.

Obviously – the first option hides both parsing and interpreting (or compiling) the parsed result. What it doesn’t provide is any introspection or any way to transform one representation of the expression into the other (except for the most simple string replacements).

Another thing to consider is how you represent the expression?

  • String in a specific grammar – this is most straightforward. When I write “1+2” you can imagine the semantics of such a grammar right away. You can also guess the type of returned value (some number), just as you can for an expression like “a>30” (boolean). It’s easy to store in a DB, file, anywhere. But it’s furthest from anything computer can directly work with, although many frameworks offer some shortcut such as an eval(…) method.
  • Tree representing the expression – but hey, what does it look like? In JVM memory it’s obvious – there are objects of various AST nodes (or ParseTree in ANTLR4). But how do you persist this? Serialize? Good, you may not need to read it and although Java serialization is not very good it may suffice for this. Or will it be tree like structure in a database? Then you need to process it somehow to form the tree in memory, but it may be much easier than parsing raw string indeed. Or you may store it as a string after all – like XML, JSON, or anything else that supports tree serialization in any form. Then you are parsing, but not with your/custom parser, you’re using well-known tools for ubiquitous formats and just transforming the input into a tree. You’re not using any custom grammar – maybe just custom XML/JSON schema or serialization mechanism. In any case, the core representation is the tree – and the tree only.

Just evaluate my string, please!

Here you want your expression to be just “executed”. If you want to do it many times for the same expression then compilation (if provided) typically makes it faster. Let’s see how you can do this easily with Java’s ScriptEngine aka JSR 223 (use your favourite IDE(A) to help you with imports ;-)):

String expr = "1+1";
ScriptEngine scriptEngine = new ScriptEngineManager().getEngineByName("ecmascript");
CompiledScript compiledScript = ((Compilable) scriptEngine).compile(expr);
Object result = compiledScript.eval();
System.out.println("result = " + result + " (" + result.getClass().getName() + ")");

Easy, isn’t it? And it is compiled! We used built-in JavaScript that is available in JDK since 1.6 (Rhino for 1.6/7 and then famous Nashorn since Java 8).

But hey, something is missing, we don’t want the result 2 every time. Let’s try different expression with some variables:

String expr = "a &lt; b";
ScriptEngine scriptEngine = new ScriptEngineManager().getEngineByName("ecmascript");
CompiledScript compiledScript = ((Compilable) scriptEngine).compile(expr);
Bindings bindings = scriptEngine.createBindings();
bindings.put("a", 30);
bindings.put("b", 25);
Object result = compiledScript.eval(bindings);
System.out.println("result = " + result + " (" + result.getClass().getName() + ")");

Here we have expression with variables and we create Bindings (implements Map) and provide values for a and b. The Bindings are used as a parameter for eval – and the result is naturally Boolean.

Other options for evaluation?

When I needed expression evaluation for configuring Java Simon callback mechanism it was in times of Java 5 – and no scripting engine. First I use JEval (it was on dev.java.net, I’m not sure whether it’s the same project) but when I moved Java Simon to Maven build JEval was lagging behind and I was looking for another solution.

So I moved to MVEL which was way too powerful for my needs – and also the JAR was around 1MB. But MVEL (MVFLEX Expression Language) is more than just some expression evaluator – it really is expression language, like you know it from JSP or Spring (SpEL). It traverses beans, can call methods, etc. If this is what you need then look for this kind of beast (EL).

Consider also the extensibility of the solution. Will you need to add custom functions? These are all things that are already covered by existing libraries. All you have to do is choose. I, personally, didn’t needed this kind of power and was happy with script engine. It is also more powerful than I need, but at least it is built-in already. (Although technically it’s just an API and your non-Oracle JDK may not provide any concrete engine.)

Expression string preprocessing

Whatever evaluator you may use, it may happen that the supported grammar is not exactly what you want. Sometimes it’s about details. Consider writing expressions into XML configuration – all those <, > and && are not very XML friendly. You may use CDATA section (better) or character entity (ugly).

Or you may introduce some simple replacement preprocessing and support LT, GT and AND (and others) instead!

// code in main(...)
String expr = "a &gt; 10 AND NOT (b == 20)";
expr = preprocessExpression(expr);
// ...the rest of the code follows as before

private static String preprocessExpression(String expr) {
    return expr
        .replaceAll(" AND ", " &amp;&amp; ")
        .replaceAll(" OR ", " || ")
        .replaceAll(" NOT ", " !")
        ;
}

If you expect a lot of replacing for a lot of expressions, you may use precompiled Patterns, but if you preprocess each expression just once and then store it as a CompiledScript the gain is probably not that big.

I used this strategy in Java Simon where I compare ns measurements and in order to do so I even provide shortcuts for seconds, millis, micros (us) – check the code here. Very simple replacements but they can help a lot. User might not guess that there is JavaScript evaluation behind.

How to validate used variables?

Imagine you want to allow only some variables in your expressions – which is reasonable. Using ScriptEngine, you know what you put into the Bindings, anything other variable would fail anyway. How to check the expression in this case? Compilation is not enough, of course.

The easiest way to check the expression is to prepare bindings with any values (just the type must be right, of course) – and then let the expression to evaluate with these bindings. This checks that everything clicks. There is possible problem though, in case of expressions where not all parts must be evaluated (like a || b when a is true) the question is whether the names in the not-evaluated part are checked or not. In case of lazy evaluation some things may slip – this definitely happens with Nashorn engine for instance.

What you can do is somehow check all identifier-like “words” in your string – you can use regex to extract them one by one. But it is cumbersome of course. This is much more natural in case of parsed expressions stored in syntax tree. But that one is for another post.

Conclusion

If we don’t need to work with the parsed syntax tree, these mentioned options are preferable to using some parser and then interpreting the expression represented as a tree. Especially when the compilation is available the result may be quite fast – of course depending on the cost of entering the runtime of the expression evaluator. It’s also mentally easy to understand – you want to evaluate the expression, and that’s what you see in the code.

In the next post we will look at the cases when you want a little bit more than just the evaluation.

Testing booleans readably in Java

This surely is not the first or last musing on how to write ifs with boolean variables in Java. I normally just write it as IDEA suggests with its “simplify” correction (inspection is called “Pointless boolean expression”). However I never criticized anyone yet for their style of comparing the boolean variable to the literal true or false. I do some code review that is a part of my role in the team and while I communicate most of my observations this I just quick fix with IDEA and go on. “Whatever your taste is, my word is the final word anyway.”

But this Friday one of the developers came back to me and said “hey, I do it so it is more readable”. And readable is the word I hear to pretty strong. I have no problem to read both these styles and I prefer the minimal one. While I’m clearly decided about comparing to true (utter waste and nicely exaggerated here), comparing to false is indeed a bit different. There you can argue that that exclamation is not always so visible.

It’s not always as easy as “short is better” – often on the contrary. Prominent example being the assignment in the middle of the expression – in a ternary expression for instance. I’d go for ifs and an assignment per line all the time. But while here we are mostly aligned when it comes to (checked == false) versus (!checked) we developers are not so unanimous.

The aforementioned link to stackoverflow inspired me to check how various cases are compiled. I made a class:

public class BooleanEqualityTest {
   boolean bb;

   private void a() {
       boolean b = bb;
   }
   private void b() {
       boolean b = bb == true;
   }
   private void c() {
       boolean b = (bb == true) == true;
   }
   private void x() {
       boolean b = !bb;
   }
   private void y() {
       boolean b = bb == false;
   }
   private void z() {
       boolean b = bb == false == false;
   }
}

After compile I went to the output directory and ran the command:

javap -private -c BooleanEqualityTest

And the result was:

Compiled from "BooleanEqualityTest.java"
public class BooleanEqualityTest extends java.lang.Object{
boolean bb;

public BooleanEqualityTest();
 Code:
  0:    aload_0
  1:    invokespecial    #1; //Method java/lang/Object."<init>":()V
  4:    return

private void a();
 Code:
  0:    aload_0
  1:    getfield    #2; //Field bb:Z
  4:    istore_1
  5:    return

private void b();
 Code:
  0:    aload_0
  1:    getfield    #2; //Field bb:Z
  4:    iconst_1
  5:    if_icmpne    12
  8:    iconst_1
  9:    goto    13
  12:    iconst_0
  13:    istore_1
  14:    return

private void c();
 Code:
  0:    aload_0
  1:    getfield    #2; //Field bb:Z
  4:    iconst_1
  5:    if_icmpne    12
  8:    iconst_1
  9:    goto    13
  12:    iconst_0
  13:    iconst_1
  14:    if_icmpne    21
  17:    iconst_1
  18:    goto    22
  21:    iconst_0
  22:    istore_1
  23:    return

private void x();
 Code:
  0:    aload_0
  1:    getfield    #2; //Field bb:Z
  4:    ifne    11
  7:    iconst_1
  8:    goto    12
  11:    iconst_0
  12:    istore_1
  13:    return

private void y();
 Code:
  0:    aload_0
  1:    getfield    #2; //Field bb:Z
  4:    ifne    11
  7:    iconst_1
  8:    goto    12
  11:    iconst_0
  12:    istore_1
  13:    return

private void z();
 Code:
  0:    aload_0
  1:    getfield    #2; //Field bb:Z
  4:    ifne    11
  7:    iconst_1
  8:    goto    12
  11:    iconst_0
  12:    ifne    19
  15:    iconst_1
  16:    goto    20
  19:    iconst_0
  20:    istore_1
  21:    return

}

What does it mean? Every == true is adding something to the code, but the first == false produces the same bytecode as the not operator itself (!). Every other == false is then adding to the code just like any other comparison. While HotSpot may do wonders with the code later I still consider it a waste and completely unnecessary – except for the single == false – and that was the case where I hesitated just a bit.

Now I will probably stop correcting this case, though I will keep writing !checked instead of checked == false. The only thing I don’t like about it all right now is that the inspection in IDEA cannot be set to ignore just this case. I can still set it to weak-warning, but that’s just not the same, you know. 🙂

What is your opinion on this matter? Do you have similar cases where you’re just not convinced which way is the cleaner one?

Update 31.10.2011: I decided after all. I’ll go for !(…) like this:

if (!(checked)) ...

This is quite clear, ! is nicely visible and redundant parentheses are not warning. After all, parentheses are traditionally used to make expressions readable. 🙂