16 August 2008

JRuby Performance fails in Batch

You JRuby advocates will say it's obvious: I did not use Java 6, did not compile to byte code etc. You JRuby objectors will say it's obvious: JRuby is so sloooow. Probably the truth is somewhere in between and depends on the circumstances. So here is my little story about trying out JRuby in the enterprise...

We have some kind of ETL Loader that collects data into objects before inserting into the warehouse. Depending on the state of every particular instance some value is precalculated and stored to speed up the queries later. (DWH-data is usually not normalised.) Until now these calculations have been spread across the whole import and transform process, which was a real pain whenever something had to be changed. Recently we had to add some new calculation rules and we decided to make it proper: 1) Put all calculation rules in one place and 2) put them all in some kind of configuration file.

As the checks of the state vary considerably, we need a "powerful" configuration language. How about a dynamic language running on top of the JVM? (Which looks like the mainstream method to solve many problems nowadays ;-) In an XML file (boring mainstream again) we have a list of simple JRuby expressions.
<requestor active="true" type="SUPPRESSED">
<requestor active="true" type="SPECIAL_CASE">
<code>['GPDA', 'LIA'].include? $action.name</code>
<requestor active="true" type="OVERLOAD">
<!-- all others are normal -->
<requestor active="true" type="NORMAL">
A centralised component evaluates them in a Chain of Responsibility style.
BSFManager manager = pContext.getManager();
int line = 1;
manager.declareBean("action", pActionData, ActionData.class);
for (ActionRequestorEvaluation eval : getRequestors()) {
boolean isOfType = ((Boolean) manager.eval("ruby",
"action eval command " + eval.getType(), line++,
1, eval.getCode())).booleanValue();
if (isOfType) {
After adding some unit tests we deployed the loader to the test system. Unfortunately this solution ran 3 times slower than before, taking 25 instead of 8 minutes for 100.000 entries, adding approx. 10 milliseconds per entry. Considering that in most cases 10 expressions are evaluated, that's only 1 millisecond per JRuby call, so it's not that bad. (The numbers were got using Java 5 with BSF 2.4.0 to access JRuby 1.1.3.)

The conclusion is that the straight forward use of JRuby in batch processing (i.e. when called millions of times) does not work. So we had to undo the changes and put the rules hard-coded into the code :-( Unfortunately we had only time to try the "plain" approach. Obviously there is room for improvement, I would like to hear about.