Monday, March 10, 2008

Duby: A Type-Inferred Ruby-Like JVM Language

It's been one of those nights, my friends! An outstanding night!

All day Sunday I had the stupids. It was probably related to a few beers on Saturday night, or the couple glasses of red wine before that. Whatever it was, I didn't trust myself to work on any JRuby bugs for fear of introducing problems if my brain clicked off mid-stream. So I started playing around with my old bytecode generating DSL from a few months back. Then things got interesting.

We've long wanted to have a "Ruby-like" language, probably a subset of Ruby syntax, that we could compile to solid, fast, idiomatic JVM bytecode. Not a compiler for Ruby, with all the bells and whistles that make Ruby both difficult to support and inefficient to use for implementing itself. A real subset language that produces clean, tight JVM bytecode on par with what you'd get from compiled Java. But better, because it still looks and feels mostly like Ruby.

So I wrote one! And I used my bytecode library too!

Let's say we want to implement our good friend fib.

class Foo
def fib(n)
if (n < 2)
n
else
fib(n - 2) + fib(n - 1)
end
end
end
This is normal ruby code. Given a Fixnum input, it calculates the appropriate fibbonaci number and returns it. It's slow in Ruby for a few reasons:
  1. In JRuby, it uses a boxed integer value. Matz's Ruby and Rubinius use tagged integers to improve performance, and we rely on the JVM to optimize as much as it can (which turns out to be a *lot*). But it's still way slower than using primitives directly.
  2. The comparison operations, integer math operations, and fib operations are all dynamic invocations. So there's at least a bit of method lookup cost, and then a bunch of abstraction cost. You can reduce it, but you can't eliminate it.
  3. There are many Ruby features that influence performance negatively even when you're not using them. It's very difficult, for example, to optimally store local variables when the local scope can be captured at any time. So either you rely on tricks, or you store local variables on the heap and deal with them being slow.
When working with a statically-typed language, you can eliminate all of this. In Java, for example, you have both object types and primitive types; primitive operations are extremely fast and eventually JIT down to machine-code equivalents; and the feature set is suitably narrow to allow current JVMs to do amazing levels of optimization.

But of course Java has its problems too. For one, it does very little guessing or "inferring" of types for you, which means you generally have to declare them all over the place. On local variables, on parameters, on return types. C# 3.0 aims to correct this by adding type inference all over, but then there's still other syntactic hassle using any C-like language: curly braces, semicolons, and other gratuitous syntax that make up a lot of visual noise.

Wouldn't be nice if we could take the code above, add some type inference logic, and turn it into a fast, static-typed language?
class Foo
def fib(n)
{n => java.lang.Integer::TYPE, :return => java.lang.Integer::TYPE}
if (n < 2)
n
else
fib(n - 2) + fib(n - 1)
end
end
end
And there it is! This is the same code as before, but now it's been decorated with a little type declaration block (in the form of a Ruby hash/map) immediately preceding the body of the method. The type decl describes that the 'n' argument is to be mapped to a primitive int, and the method itself will return a primitive int (and yes, I know those could be inferred too...it shall be soon). The rest of the method just works like you'd expect, except that it's all primitive operations, chosen based on the inferred types. For the bold, here's the javap disassembly output from the compiler:
Compiled from "superfib.rb"
public class Foo extends java.lang.Object{
public int fib(int);
Code:
0: iload_1
1: ldc #10; //int 2
3: if_icmpge 10
6: iload_1
7: goto 27
10: aload_0
11: iload_1
12: ldc #10; //int 2
14: isub
15: invokevirtual #12; //Method fib:(I)I
18: aload_0
19: iload_1
20: ldc #13; //int 1
22: isub
23: invokevirtual #12; //Method fib:(I)I
26: iadd
27: ireturn

public Foo();
Code:
0: aload_0
1: invokespecial #15; //Method java/lang/Object."<init>":()V
4: return

}
A few items to point out:
  • A default constructor is generated, as you'd expect in Java. This will be expected to also recognize "def initialize" constructors. I haven't decided if I'll allow overloading or not.
  • Notice the type signature for fib and all the type signatures for calls it makes are correctly inferred to the correct types.
  • Notice all the comparison and arithmetic operations are compiled to the correct bytecodes (iadd, isub, if_icmpge and so on).
And the performance is what you'd hope for:
$ jruby -rjava -e "t = Time.now; 5.times {Java::Foo.new.fib(35)}; p Time.now - t"
0.681
$ jruby -rnormalfib -e "t = Time.now; 5.times {Foo.new.fib(35)}; p Time.now - t"
27.851
Here's another example, with some string operations thrown in:
class Foo
def bar
{:return => java.lang.String}

'here'
end
end

class Foo
# reopening classes works in the same file only (for now)
def baz(a)
{a => java.lang.String}

b = "foo"
a = a + bar + b
puts a
end
end
It works, of course:
$ jruby -rjava -e "Java::Foo.new.baz('Type inference is fun')"
Type inference is funherefoo
And once again, the disassembled output:
Compiled from "stringthing.rb"
public class Foo extends java.lang.Object{
public java.lang.String bar();
Code:
0: ldc #13; //String here
2: areturn

public void baz(java.lang.String);
Code:
0: ldc #15; //String foo
2: astore_2
3: aload_1
4: checkcast #17; //class java/lang/String
7: aload_0
8: invokevirtual #19; //Method bar:()Ljava/lang/String;
11: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
14: checkcast #17; //class java/lang/String
17: aload_2
18: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
21: astore_1
22: getstatic #29; //Field java/lang/System.out:Ljava/io/PrintStream;
25: aload_1
26: invokevirtual #34; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
29: return

public Foo();
Code:
0: aload_0
1: invokespecial #36; //Method java/lang/Object."<init>":()V
4: return

}
Notice here that the + operation was detected as acting on two strings, so it was compiled to call String#concat rather than try to do a numeric operation. These sorts of mappings are simple to add, and since there's type information everywhere it's also easy to come up with cool ways to map Ruby syntax to Java types.

My working name for this is going to be Duby, pronounced "Doobie", after Duke plus Ruby. Duby! It has a nice ring to it. It may be subject to change, but that's what we'll go with for the moment.

Currently, Duby supports only the features you see here. It's a very limited subset of Ruby right now, and the subset doesn't support all Java primitive types, for example, so there's a lot of blanks to be filled. It also doesn't support static (class) methods, doesn't wire up "initialize" methods, doesn't support packages (for namespacing) or imports (to shrink type names) or superclasses or interfaces or enums or generics or what have you. But it's already functional for simple code, as you see here, so I think it's a great start for 10 hours of work. The rest will come, as needed and as time is available.

What are my plans for it? Well many of you may know Rubinius, an effort to reimplement Ruby in Ruby, modeled after the design of many Smalltalk VMs. Well, in order to make JRuby more approachable to Ruby developers, Duby seems like the best way to go. They'll be able to write mostly idiomatic Ruby code and know it will both perform at Java speeds and provide compile-type checking that all is wired up correctly. So we can avoid the overhead of "turtles all the way down" by just teaching one turtle how to speak JVM bytecode and building on that.

I also hope this will lead to lots of new ideas about how to implement languages for the JVM. It's certainly attractive to language users to be able to contribute to language implementations using the language in question, and if we can come up with compilers and sub-languages like Duby it could make the JVM more approachable to a wide range of developers.

Oh, and for those of you curious: the Duby compiler is written mostly in Ruby too.