Thursday, January 18, 2007

JRuby Compiler: In Trunk and Ready to Play

Times they are a-changing.

I posted previously on JRuby's compiler work. There have been various iterations of the compiler, many purely prototype and never intended to be completed, and a few genuine attempts at evolving toward full Ruby support. However I believe in the recent weeks I've settled on a design that will carry us to the JRuby compiler endgame.

For the past year, we've emphasized correctness over performance nine times out of ten. When we did focus on performance, it was solely on improving JRuby's interpreter speed, in an attempt to match Ruby's performance in this area and because we knew that JRuby could never entirely escape interpretation. Ruby's just too dynamic for that. So while compatibility with Ruby 1.8.x continued to improve by leaps and bounds, our performance was rather poor in comparison.

This past fall, things started to change. Compatibility reached a point where we could finally be confident about our set of regression tests and our understanding of "how Ruby works" across all its weirdest features. As we understood better the design of the C implementation and the quirky intricacies of the language, we started to see a path to enlightenment. We started to realize how we could support Ruby as it exists today while simultaneously evolving JRuby into a more efficient and cleaner design. And so the performance numbers started to change.

From 0.9.0 to 0.9.1, we had a clean doubling of performance across the board. Our favorite benchmark--RDoc generation--was easily twice as fast, and other simpler benchmarks like fib had similar improvements. 0.9.2 was more of a rushed release for JavaPolis, but we had a good 1/4 to 1/3 speedup even then, since the ongoing refactoring removed another large chunk of overhead from JRuby's core runtime.

From 0.9.2 to current trunk, however, has been a different matter entirely.

The first major change is that we've started to seriously alter the way JRuby does dynamic method dispatching. I did some research, read a few papers, and mocked up and benchmarked a few options. What we've settled on for the moment is a combination of STI for the core classes (STI provides a large table mapping methods and classes to actual code) and various forms of inline caching for non-core classes (basically, for pure Ruby classes; though this is yet to be implemented in trunk). STI provides an extremely fast path for dispatch on those hardest-hit methods, since it reduces calling most core methods to two array indexes and a switch, a vast improvement over the hash lookup and multiple layers of abstraction and framing we had before.

We are continuing to expand our use of STI as it is applicable, and I will soon start exploring options for interpreted-mode inline caching (polymorphic, likely, though I need to run a few trials to get numbers balanced right). So fast dynamic dispatching is well on its way, and will improve performance across the board.

Then there's the compiler work. You have no idea how much it's irritated me to hear people talk about JRuby the past year and say "yeah, but it doesn't compile to Java bytecode." This obviously amounts to pure FUD, but beyond that it totally ignores the complexity of the problem: not a single person on this earth has managed to compile Ruby to a general-purpose VM yet. So complaining about our missing compiler is a bit like complaining that we haven't moved mountains. Honestly people, what do you expect?

Of course, there's the flip side of this statement: compiling Ruby is a hard problem, and I like hard problems. For me it's doubly hard, since I've never written a compiler before. But hell, before JRuby I'd never even worked on an interpreter or language implementation before, and that seems to have gone alright. So there it is...Mount Ruby, waiting to be climbed. And climb it I must!

The current compiler design lives in two halves: the AST-walking half; and the code-generation half. I chose to split these two because it make several things easier. For starters, it allows me to abstract all the bytecode generation logic behind a simple interface, an interface that presents coarse-grained operations like invokeDynamic() and retrieveLocalVariable(). The ultimate implementation of those operations can then be modified at will. It also allows us to evolve the AST independently of the compiler backend, even to the point of swapping in a completely different parser and in-memory code representation (like YARV bytecodes) without harming the evolving code generator backend. So this split helps future-proof the compiler work.

The current design also has another advantage: not all of Ruby has to compile for it to be useful. Currently, as the AST walker encounters nodes, if it finds a node it can't deal with it simply raises an exception. Compilation terminates, and the compiler's client can deal with the result as it will. This leads to a really powerful feature of this design: we can install the compiler now as a JIT and as it evolves more and more code will automatically get optimized. So once we're confident that a given node type is 100% compiling correctly, that node will now be eligible for JIT compilation. As an example, here's the output from a gem installation with the current compiler enabled as a JIT (with my logging in place, naturally):

compiled: TarHeader.empty?
compiled: Entry.initialize
compiled: Entry.full_name
compiled: Entry.bytes_read
compiled: Entry.close
compiled: Entry.invalidate
Successfully installed rake, version 0.7.1
Installing ri documentation for rake-0.7.1...
compiled: LeveledNotifier.notify?
compiled: LeveledNotifier.<=>
compiled: RubyLex.getc
compiled: null.debug?
compiled: BufferedReader.ungetc
compiled: Token.set_text
compiled: RubyLex.line_no
compiled: RubyLex.char_no
compiled: BufferedReader.column
compiled: RubyToken.set_token_position
compiled: Token.initialize
compiled: RubyLex.get_read
compiled: RubyLex.getc_of_rests
compiled: BufferedReader.getc_already_read
compiled: BufferedReader.peek
compiled: RubyParser.peek_tk
compiled: TokenStream.add_token
compiled: TokenStream.pop_token
compiled: CodeObject.initialize
compiled: RubyParser.remove_token_listener
compiled: Context.ongoing_visibility=
compiled: PreProcess.initialize
compiled: AttrSpan.[]
compiled: null.wrap
compiled: JavaProxy.to_java_object
compiled: Lines.next
compiled: Line.isBlank?
compiled: Fragment.add_text
compiled: Fragment.initialize
compiled: ToFlow.convert_string
compiled: LineCollection.add
compiled: Entry_.path
compiled: Entry_.directory?
compiled: Entry_.dereference?
compiled: AttrSpan.initialize
compiled: Entry_.prefix
compiled: Entry_.rel
compiled: Entry_.remove
compiled: Lines.rewind
compiled: AnyMethod.<=>
compiled: Description.serialize
compiled: AttributeManager.change_attribute
compiled: AttributeManager.attribute
compiled: ToFlow.annotate
compiled: NamedThing.initialize
compiled: ClassModule.full_name
compiled: Lines.initialize
compiled: Lines.empty?
compiled: LineCollection.normalize
compiled: ToFlow.end_accepting
compiled: Verbatim.add_text
compiled: FalseClass.to_s
compiled: TopLevel.full_name
compiled: Attr.<=>
Installing RDoc documentation for rake-0.7.1...
compiled: Context.add_attribute
compiled: Context.add_require
compiled: Context.add_class
compiled: AbstructNotifier.notify?
compiled: Context.add_module
compiled: LineReader.read
compiled: null.instance
compiled: HtmlMethod.path
compiled: HtmlMethod.aref
compiled: ContextUser.initialize
compiled: HtmlClass.name
compiled: TokenStream.token_stream
compiled: LineReader.initialize
compiled: TemplatePage.write_html_on
compiled: Context.push
compiled: Context.pop
compiled: HtmlMethod.name
compiled: Context.find_local_symbol
compiled: SimpleMarkup.add_special
compiled: TopLevel.find_module_named
compiled: Context.find_enclosing_module_named
compiled: HtmlMethod.<=>
compiled: ToHtml.annotate
compiled: HtmlMethod.visibility
compiled: HtmlMethod.section
compiled: HtmlMethod.document_self
compiled: LineReader.dup
compiled: Lines.unget
compiled: ToHtml.accept_paragraph
compiled: ContextUser.document_self
compiled: ToHtml.accept_heading
compiled: Heading.head_level
compiled: ToHtml.accept_list_start
compiled: ToHtml.accept_list_end
compiled: ToHtml.accept_verbatim
compiled: SimpleMarkup.initialize
compiled: AttributeManager.initialize
compiled: ToHtml.initialize
compiled: ToHtml.end_accepting
compiled: HtmlMethod.singleton
compiled: Context.modules
compiled: Context.classes
compiled: ContextUser.build_include_list
compiled: HtmlMethod.description
compiled: HtmlMethod.parent_name
compiled: HtmlMethod.aliases
compiled: HtmlClass.parent_name
compiled: ContextUser.as_href
compiled: ContextUser.url
compiled: ContextUser.aref_to
compiled: HtmlFile.<=>
compiled: HtmlClass.<=>
You can see from the output that not only are RubyGems methods getting compiled, but so are stdlib methods and our own Java integration methods. And this is with the current compiler, which doesn't support compiling class defs, blocks, case statements, ... Hopefully you get the picture; this bit-by-bit implementation of the compiler allows us to slowly grow our ability to optimize Ruby into Java bytecodes.

So then, how well does it perform? It performs just dandy, when we're able to compile. Witness the following results for a simple recursive fib algorithm running under Ruby 1.8.5 and JRuby trunk with the JIT enabled.

$ ruby test/bench/bench_fib_recursive.rb
12.760000 1.400000 14.160000 ( 14.718925)
12.660000 1.490000 14.150000 ( 14.648681)
$ JAVA_OPTS=-Djruby.jit.enabled=true jruby test/bench/bench_fib_recursive.rb
compiled: Object.fib_ruby
8.780000 0.000000 8.780000 ( 8.780000)
7.761000 0.000000 7.761000 ( 7.761000)
Yes, that's nearly double the performance of the C implementation of Ruby. And this is absolutely real.

Now JITing is great, and it's obviously carried Java a long ways. The HotSpot JIT is an unbelievable piece of work, and any app that runs a long time is guaranteed to perform better and better as deeper optimizations start to take hold. But We're talking about Ruby here, which starts up at C-program speeds, and runs as fast as it does immediately. So then JRuby needs a way to compete for immediate execution performance, and the most straightforward way to do that is with an ahead-of-time compiler. That compiler is now also available in JRuby trunk.

The name of the command is "jrubyc", and it does just what you'd expect, it outputs a Java class file for your Ruby code. However the mapping from Ruby code to a class file is not as straightforward as you'd expect: a Ruby script may contain many classes or no classes at all, and those classes may be opened and re-opened by the same script or other scripts at runtime. So there's no way to map directly from a Ruby class to a Java class given the strict limitations of Java's class model. But there is a much smaller unit of code that does not change over time, aside from being mercilessly juggled around: methods.

Ruby, in the end, is a creative and sometimes complicated jumble of method "objects", floating from class to class, from module to module, from namespace to namespace. Methods can be renamed, redefined, added and removed, but never can they be directly modified. And so here is where we have our immutable item to compile.

JRuby's compiler takes a given Ruby script and generates the following Java methods out of it: One Java method for the top-level, straight-through execution of the script, including class bodies and "def"s and the like (called "__file__" in the eventual Java class...thanks Ola for the idea), and a Java method for every Ruby method body and closure contained therein, named in such a way as to avoid conflicts. So for the following piece of code:
require 'foo'

def bar
baz { puts "hello" }
end

def baz
yield
end
There would be four Java methods generated: one for the toplevel execution of the script, two for the bar and baz methods, and one for the closure contained within bar. The resulting class file would store these as static methods, so they are accessible from any class or object as necessary, and the toplevel run-through would bind the two Ruby methods to their appropriate names in Ruby-space.

Quite simple, really!

So then an example of the precious, precious JRuby compiler:
$ cat fib_recursive.rb
def fib_ruby(n)
if n < 2
n
else
fib_ruby(n - 2) + fib_ruby(n - 1)
end
end

puts fib_ruby(34)
$ jrubyc fib_recursive.rb
$ ls fib_recursive.*
fib_recursive.class fib_recursive.rb
$ time java -cp lib/jruby.jar:lib/asm-2.2.2.jar:. fib_recursive
5702887

real 0m8.126s
user 0m7.632s
sys 0m0.208s
$ time ruby fib_recursive.rb
5702887

real 0m14.649s
user 0m12.945s
sys 0m1.480s
Again, about twice as fast as Ruby 1.8.5 for this particular benchmark.

Now I don't want you going off and saying JRuby has a perfect compiler that will double the performance of your Rails apps. That's not true yet. The current compiler covers only about 30% of the possible code constructs in Ruby, and the remaining 60% (Update: 70%...that's what I get for late-night blogging) contains some of the biggest challenges like closures and class definitions. It's sure to be buggy right now, and the JIT isn't even enabled by default, plus it has my nasty logging message burned into it, to discourage any production use.

But it is very real. JRuby has a partial but growing compiler for Ruby to Java bytecode now.

And oh my, look at the time. Tonight I have to finish my visa application for a trip to India, nail down schedules and descriptions for several upcoming talks, and prepare some slides and notes for presentations in the coming weeks. You will see more about the Java compilation and our developing YARV/Ruby 2.0 bytecode support over the next couple months...and you can expect JavaOne to be an interesting time for Ruby on the JVM this year ;)