Expression evaluation in Java (1)

There are times when you need to evaluate some expression in your program. Sometimes you can create something that is not text/string based, that is you construct the expression in a different way – typically so it is easier to validate, evaluate, etc. Sometimes, nothing beats the textual representation. It’s easy and quick to write, and I prefer it any time to various click-based solutions (typically based on some tree representation).

But the expression is a tree after all and we need to parse it somehow. Or do we? In my first sentence I said I needed to evaluate the expression. Can we forgo the parsing altogether?

This will be just the first of (probably) two posts and after some general thoughts I will focus on direct string expression evaluation. Parsing and what follows will follow in another post.

Parsing the grammar or what?

If you know tools like ANTLR then you may be asking what am I going to parse here? Is it a grammar written in some meta-language? No – I’m not interested in grammar itself now at all. Grammar is given and I want to parse the expression written in that grammar. If we used ANTLR, we’d be working with classes generated from the grammar.

Also, just to be sure, anytime I say parse/parsing mean it in a broader sense – both lexer and parser process together (except for cases where I explicitly mention tokenization).

Evaluate or parse as well?

Unless you internally represent the expression as a tree already, you will need to parse it from the string/stream – or something has to do it for you. Next question is whether you want to work with the expression tree and interpret it yourself, or you are really interested only in the result of the expression evaluation.

Evaluation in any case means that I can feed the same expression (whatever the representation is) with various inputs – which is valuable when when the expression contains variables. Without variables it doesn’t make any sense anyway. For instance in case of Java’s ScriptEngine, this is what Bindings do.

So what are our options?

  • Expression evaluation from string to result in one go. I’m not interested in a tree, any internals, I just want results. String input may be compiled for efficiency if it is to be evaluated many times.
  • I have tree representation of the particular expression based on well-know types (custom or from some API, doesn’t matter) and I interpret it. This implies that the tree construction is supported by UI, tree is stored in a meaningful way, etc.
    I don’t see any easy way to compile it here, unless we produce string representation first and then follow the previous bullet. I’d go this path only if compilation provided big performance win (after thorough benchmarks).
  • I have string input, but I want to parse it first and then follow previous bullet. There are many parsers out there, many blog posts about the topic, etc. However, I’d just take ANTLR and use that. You generate the classes from a grammar and you’re done, then it’s just pure Java with a bit of antlr-runtime. It’s the state of the art technology, reinventing wheel here is pointless unless you have some extraordinary requirements.

Obviously – the first option hides both parsing and interpreting (or compiling) the parsed result. What it doesn’t provide is any introspection or any way to transform one representation of the expression into the other (except for the most simple string replacements).

Another thing to consider is how you represent the expression?

  • String in a specific grammar – this is most straightforward. When I write “1+2” you can imagine the semantics of such a grammar right away. You can also guess the type of returned value (some number), just as you can for an expression like “a>30” (boolean). It’s easy to store in a DB, file, anywhere. But it’s furthest from anything computer can directly work with, although many frameworks offer some shortcut such as an eval(…) method.
  • Tree representing the expression – but hey, what does it look like? In JVM memory it’s obvious – there are objects of various AST nodes (or ParseTree in ANTLR4). But how do you persist this? Serialize? Good, you may not need to read it and although Java serialization is not very good it may suffice for this. Or will it be tree like structure in a database? Then you need to process it somehow to form the tree in memory, but it may be much easier than parsing raw string indeed. Or you may store it as a string after all – like XML, JSON, or anything else that supports tree serialization in any form. Then you are parsing, but not with your/custom parser, you’re using well-known tools for ubiquitous formats and just transforming the input into a tree. You’re not using any custom grammar – maybe just custom XML/JSON schema or serialization mechanism. In any case, the core representation is the tree – and the tree only.

Just evaluate my string, please!

Here you want your expression to be just “executed”. If you want to do it many times for the same expression then compilation (if provided) typically makes it faster. Let’s see how you can do this easily with Java’s ScriptEngine aka JSR 223 (use your favourite IDE(A) to help you with imports ;-)):

String expr = "1+1";
ScriptEngine scriptEngine = new ScriptEngineManager().getEngineByName("ecmascript");
CompiledScript compiledScript = ((Compilable) scriptEngine).compile(expr);
Object result = compiledScript.eval();
System.out.println("result = " + result + " (" + result.getClass().getName() + ")");

Easy, isn’t it? And it is compiled! We used built-in JavaScript that is available in JDK since 1.6 (Rhino for 1.6/7 and then famous Nashorn since Java 8).

But hey, something is missing, we don’t want the result 2 every time. Let’s try different expression with some variables:

String expr = "a < b";
ScriptEngine scriptEngine = new ScriptEngineManager().getEngineByName("ecmascript");
CompiledScript compiledScript = ((Compilable) scriptEngine).compile(expr);
Bindings bindings = scriptEngine.createBindings();
bindings.put("a", 30);
bindings.put("b", 25);
Object result = compiledScript.eval(bindings);
System.out.println("result = " + result + " (" + result.getClass().getName() + ")");

Here we have expression with variables and we create Bindings (implements Map) and provide values for a and b. The Bindings are used as a parameter for eval – and the result is naturally Boolean.

Other options for evaluation?

When I needed expression evaluation for configuring Java Simon callback mechanism it was in times of Java 5 – and no scripting engine. First I use JEval (it was on dev.java.net, I’m not sure whether it’s the same project) but when I moved Java Simon to Maven build JEval was lagging behind and I was looking for another solution.

So I moved to MVEL which was way too powerful for my needs – and also the JAR was around 1MB. But MVEL (MVFLEX Expression Language) is more than just some expression evaluator – it really is expression language, like you know it from JSP or Spring (SpEL). It traverses beans, can call methods, etc. If this is what you need then look for this kind of beast (EL).

Consider also the extensibility of the solution. Will you need to add custom functions? These are all things that are already covered by existing libraries. All you have to do is choose. I, personally, didn’t needed this kind of power and was happy with script engine. It is also more powerful than I need, but at least it is built-in already. (Although technically it’s just an API and your non-Oracle JDK may not provide any concrete engine.)

Expression string preprocessing

Whatever evaluator you may use, it may happen that the supported grammar is not exactly what you want. Sometimes it’s about details. Consider writing expressions into XML configuration – all those <, > and && are not very XML friendly. You may use CDATA section (better) or character entity (ugly).

Or you may introduce some simple replacement preprocessing and support LT, GT and AND (and others) instead!

// code in main(...)
String expr = "a &gt; 10 AND NOT (b == 20)";
expr = preprocessExpression(expr);
// ...the rest of the code follows as before

private static String preprocessExpression(String expr) {
    return expr
        .replaceAll(" AND ", " &amp;&amp; ")
        .replaceAll(" OR ", " || ")
        .replaceAll(" NOT ", " !")
        ;
}

If you expect a lot of replacing for a lot of expressions, you may use precompiled Patterns, but if you preprocess each expression just once and then store it as a CompiledScript the gain is probably not that big.

I used this strategy in Java Simon where I compare ns measurements and in order to do so I even provide shortcuts for seconds, millis, micros (us) – check the code here. Very simple replacements but they can help a lot. User might not guess that there is JavaScript evaluation behind.

How to validate used variables?

Imagine you want to allow only some variables in your expressions – which is reasonable. Using ScriptEngine, you know what you put into the Bindings, anything other variable would fail anyway. How to check the expression in this case? Compilation is not enough, of course.

The easiest way to check the expression is to prepare bindings with any values (just the type must be right, of course) – and then let the expression to evaluate with these bindings. This checks that everything clicks. There is possible problem though, in case of expressions where not all parts must be evaluated (like a || b when a is true) the question is whether the names in the not-evaluated part are checked or not. In case of lazy evaluation some things may slip – this definitely happens with Nashorn engine for instance.

What you can do is somehow check all identifier-like “words” in your string – you can use regex to extract them one by one. But it is cumbersome of course. This is much more natural in case of parsed expressions stored in syntax tree. But that one is for another post.

Conclusion

If we don’t need to work with the parsed syntax tree, these mentioned options are preferable to using some parser and then interpreting the expression represented as a tree. Especially when the compilation is available the result may be quite fast – of course depending on the cost of entering the runtime of the expression evaluator. It’s also mentally easy to understand – you want to evaluate the expression, and that’s what you see in the code.

In the next post we will look at the cases when you want a little bit more than just the evaluation.

3 thoughts on “Expression evaluation in Java (1)

Leave a comment