Sunday, July 15, 2007

The Ever-Evolving JVM

John Rose, lead of the "invokedynamic" effort (Java Specification Request 292), has posted some exciting articles about the future of the JVM and a number of changes potentially for the next Java version. Among these is, of course, the dynamic invocation efforts, but these entries include information on non-local returns from closures, tail calls, and tuple support. I'm excited to have John as a co-worker and to be helping out the invokedynamic effort in my own small way by working on JRuby's compiler. John has also provided guidance on how to make a dynamic language fast on current JVMs, which has informed much of my compiler's design.

Check out his articles:

Longumps Considered Inexpensive
tail calls in the VM
tuples in the VM

It's going to be a fun year for language developers on the JVM!

More Compiler Strategy: Call Adapters and Stack-based Methods

Compilers are hard. But not so hard as people would have you believe.

I've committed an update that installs a CallAdapter for every compiled call site. CallAdapter is basically a small object that stores the following:

  • method name
  • method index
  • call type (normal, functional, variable)
As well as providing overloaded call() implementations for 1, 2, 3, n arguments and block or no block. The basic goal with this class is to provide a call adapter (heh) that makes calling a Ruby method in compiled code as similar to (and simple as) calling any Java method.

The end result is that while compiled class init is a bit larger (needs to load adapters for all call sites), compiled method size has dropped substantially; in compiling bench_method_dispatch.rb, the two main tests went from 4000 and 3500 bytes of code down to 1500 and 1000 bytes (roughly). And simpler code means HotSpot has a better chance to optimize.

Here's the latest numbers for the bench_method_dispatch_only test, which just measures time to call a Ruby-implemented method a bunch of times:
Test interpreted: 100k loops calling self's foo 100 times
2.383000 0.000000 2.383000 ( 2.383000)
2.691000 0.000000 2.691000 ( 2.691000)
1.775000 0.000000 1.775000 ( 1.775000)
1.812000 0.000000 1.812000 ( 1.812000)
1.789000 0.000000 1.789000 ( 1.789000)
1.776000 0.000000 1.776000 ( 1.777000)
1.809000 0.000000 1.809000 ( 1.809000)
1.779000 0.000000 1.779000 ( 1.781000)
1.784000 0.000000 1.784000 ( 1.784000)
1.830000 0.000000 1.830000 ( 1.830000)
And Ruby 1.8.6 for reference:
Test interpreted: 100k loops calling self's foo 100 times
2.160000 0.000000 2.160000 ( 2.188087)
2.220000 0.010000 2.230000 ( 2.237414)
2.230000 0.010000 2.240000 ( 2.248185)
2.180000 0.010000 2.190000 ( 2.218540)
2.240000 0.010000 2.250000 ( 2.259535)
2.220000 0.010000 2.230000 ( 2.241170)
2.150000 0.010000 2.160000 ( 2.178414)
2.240000 0.010000 2.250000 ( 2.259772)
2.260000 0.000000 2.260000 ( 2.285141)
2.230000 0.010000 2.240000 ( 2.252396)
Note that these are JIT numbers rather than fully precompiled numbers, so this is 100% real-world safe. Fully precompiled is just a bit faster, since there's no interpreted step or DefaultMethod wrapper to go through.

I have also made a lot of progress on adapting the compiler to create stack-based methods when possible. Basically, this involved inspecting the code for anything that would require access to local variables outside the body of the call. Things like eval, closures, etc. At the moment it works well and passes all tests, but I know methods similar to gsub which modify $~ or $_ are not working right. It's disabled at the moment, pending more work, but here's the method dispatch numbers with stack-based method compilation enabled:
Test interpreted: 100k loops calling self's foo 100 times
1.735000 0.000000 1.735000 ( 1.738000)
1.902000 0.000000 1.902000 ( 1.902000)
1.078000 0.000000 1.078000 ( 1.078000)
1.076000 0.000000 1.076000 ( 1.076000)
1.077000 0.000000 1.077000 ( 1.077000)
1.086000 0.000000 1.086000 ( 1.086000)
1.077000 0.000000 1.077000 ( 1.077000)
1.084000 0.000000 1.084000 ( 1.084000)
1.090000 0.000000 1.090000 ( 1.090000)
1.083000 0.000000 1.083000 ( 1.083000)
It seems very promising work. I hope I'll be able to turn it on soon.

Oh, and for those who always need a fib fix, here's fib with both optimizations turned on:
~ $ jruby -J-server bench_fib_recursive.rb      
1.258000 0.000000 1.258000 ( 1.258000)
0.990000 0.000000 0.990000 ( 0.989000)
0.925000 0.000000 0.925000 ( 0.926000)
0.927000 0.000000 0.927000 ( 0.928000)
0.924000 0.000000 0.924000 ( 0.925000)
0.923000 0.000000 0.923000 ( 0.923000)
0.927000 0.000000 0.927000 ( 0.926000)
0.928000 0.000000 0.928000 ( 0.929000)
And MRI:
~ $ ruby bench_fib_recursive.rb
1.760000 0.010000 1.770000 ( 1.775660)
1.760000 0.010000 1.770000 ( 1.776360)
1.760000 0.000000 1.760000 ( 1.778413)
1.760000 0.010000 1.770000 ( 1.776767)
1.760000 0.010000 1.770000 ( 1.777361)
1.760000 0.000000 1.760000 ( 1.782798)
1.770000 0.010000 1.780000 ( 1.794562)
1.760000 0.010000 1.770000 ( 1.777396)
These numbers went down a bit because the call adapter is currently just generic code, and generic code that calls lots of different methods causes HotSpot to stumble a bit. The next step for the compiler is to generate custom call adapters for each call site that handle arity correctly (avoiding IRubyObject[] all the time) and call directly to the most-likely target methods.