Friday, March 21, 2008

More Fun With Duby

It's been...oh my, almost two weeks since I broke the news about Duby. Since then I attended PyCon and we managed to get JRuby 1.1 RC3 out the door, which is looking like it will become JRuby 1.1 final. But I've still been working on Duby in my spare time. It's kinda nice to have a different "spare time" project than JRuby for a while.

After my previous post, I continued to carry the compiler forward. I managed to get it compiling the following syntactic structures:

  • until and while loops
  • static method definitions and calls (using "def self.foo" syntax)
  • array types (which combined with static calls allowed creating a "main" method that worked!)
  • instance variables (using @foo to mean the "foo" field on a Java type)
  • type resolution from any Java type
  • imports (with "import foo.bar.Baz" syntax)
  • packages (using "module Foo" syntax, though that will probably change)
  • type inference across methods on the same type, including smarts for instance vs static
All very exciting stuff, and starting to get very close to a language that would be usable for simple algorithms. Except there was just one problem: every new feature I added made the codebase more difficult to understand.

Reloaded

I had just started to explore the next steps when I realized the codebase was becoming too hard to follow. After chasing around a few inference bugs I decided it was time to take a step back. I also hoped all along to eliminate the hard requirement to declare a return type, since that at least should be apparent from inspecting the method body...but it wouldn't be possible with the original design.

The initial Duby compiler was entirely made up of decorations on the existing Ruby AST. A CallNode, for example, was decorated with methods called "compile", "declared_type", "signature" and so on that were called in various sequences to first infer types and then produce bytecode. There were a few problems with this approach that became apparent as work continued:
  1. The hypothetical Duby AST structure did not map 1:1 to the Ruby AST structure; it was richer, with type information, method signatures, and overloaded/hooked syntax. To support this in the existing AST, many layers of complexity had to be added to the compilation process: "is this call being used as a call or as a type declaration?"
  2. The decorator methods were inextricably entwined with Java/JVM-specific details, be they type names, bytecodes, or just obviously Java-centric syntax. This not only made it more difficult to evolve the language, but made it impossible for the language to live anywhere but on JVM and be compiled by anything but JRuby.
  3. My grand plans for the language were quickly exceeding what would be possible by simply decorating the AST.
The third bullet may spark some interest, so I'll explain very briefly (since it's still just forming in my head). As I thought more about how to map Ruby to the JVM, I realized that very few algorithms, very little syntax couldn't be made to map to *something* fast, static-typed, and conceptually the same. Mix-ins, for example, could easily be implemented like C# 3.0's extension methods. Blocks could be implemented behind the scenes as anonymous types of particular signatures. Even operator overloading could easily be mapped to appropriate methods on Numeric types, Collection types, and many others. The key to all this was having type inferencing and compiler layers that were not just flexible, but infinitely pluggable.

The Dream of Duby

The dream, as I see it, would be to use a Ruby-like syntax to wire up any arbitrary types using what looks generally like idiomatic Ruby code. For example, our friend fib(). The same fib() method I showed before, executing against primitive integer values, could execute against boxed Integer objects or BigInteger objects just by specifying the right plugin. So if within the context of a file, I declare that "fixnum" types are to be represented as BigInteger objects, and math operations against them are to call appropriate methods on BigInteger, so be it...that's what comes out the other side.

The term for this is, I believe, "specialization". In the case above, it's explicit specialization on a case-by-case basis. For many cases, this is all you need...you know the types you're dealing with ahead of time and can comfortably declare them or specify type mappings during compilation. But the more interesting side of this comes from general-purpose specialization in the form of parametric polymorphism...or in terms you might be more familiar with: generics.

I am certainly stepping beyond my current education and understanding here, but the pieces are starting to fit together for me. I and others like me have long been envious of languages like Haskell which infer types so incredibly well. And living between the Ruby and Java worlds, I've often felt like there had to be some middle ground that would satisfy both ends of the spectrum: a static, verified type system flexible enough to consume and specialize against any incoming types just by loading a plugin or recompiling, combined with the cleanest, most expressive (while still being one of the most reable) dynamic language syntaxes around. And so that's where I'd like Duby to fit.

The New Version

So where does it stand now?

Things have been moving fast. With JRuby 1.1 RC3 out of the way, I've taken some time to go back and rework the Duby pipeline.

Now, the only decoration on the Ruby AST is in the form of transformation logic to produce a richer Duby syntax tree. Literals are the same. The three call types have been unified into two: Call and FunctionalCall (against "self"). The various types of code bodies have been unified into a single Body construct. Method definition is represented through MethodDefinition and StaticMethodDefinition, both of which aggregate a "signature" element to represent declared or inferred signature information. The several looping constructs (excluding for loops, which are block-based iterations) have been unified into Loop. And so on. Not all the syntax supported by the Duby prototype has been represented in transformation, but it's not far off. And I'm taking a much more measured approach now.

The new AST structure affords a much more realistic typing and compilation pipeline. Each of the Duby AST nodes defines a single method "infer" which, when passed a Typer, will walk its children and infer as many types as it is able. Each child walks its own children, and so on, unifying and generalizing types as it goes (though the unification and generalization is stubbed out now to only allow exact matches...this will eventually be the responsibility of the back-end to handle). Simple code that calls fully-resolved target methods and has no unknown branches may completely resolve during this first pass.

In more complicated cases, each node that can't make an accurate determination about inferred type registers itself with the Typer in its "deferred" list. Once the initial inference cycle has run, all nodes in the AST will have either successfully inferred their result type or registered with the deferred list. At this point, you can either continue to pass ASTs through the Typer, or you can begin the resolution phase.

To start the resolution phase, the "resolve" method on the Typer is called, which attempts to iteratively resolve all deferred nodes in the AST. This resolution process in theory will loop until either the deferred list has been drained of all nodes (which will presumably then be 100% resolved), or until two successive resolution cycles fail to alter the list (perhaps because more information is needed or there are circular references for which the user must add hints). In general, this means that the deepest unresolved child nodes will fall out first. For example, if you have a method "foo" that calls a method "bar" before "bar" has been defined in the source.
def foo
bar
end
def bar
1
end
During the first inference cycle, bar will completely resolve but foo will be deferred. During the resolution phase, foo will now see that bar has been defined and resolved, and will itself resolve. Both return :fixnum (though the decision of what type ":fixnum" resolves to will be left to the compiler backend for a given system).

Back to fib()

Here's our friend fib(), which serves as a nice simple example:
def fib(n)
{n => :fixnum}

if n < 2
n
else
fib(n - 1) + fib(n - 2)
end
end
The fib() method is actually fairly interesting here because it recurses. If this were a simple recursion, it would be impossible to determine what actual type fib returns without an explicit declaration, since no other information is available, and this would produce an error of the following variety:
~/NetBeansProjects/jruby ➔ cat recurse.rb
def recurse
recurse
end
~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb recurse.rb
Could not infer typing for the following nodes:
FunctionalCall(recurse) (child of MethodDefinition(recurse))
MethodDefinition(recurse) (child of Script)
Script (child of )

AST:
Script
MethodDefinition(recurse)
{:return=>Type(notype)}
Arguments
FunctionalCall(recurse)
Here we see a simple "recurse" method that just calls itself, and as you'd expect type inference fails. Because the return value of "recurse" depends on knowing the return value of "recurse", resolution fails.

However in the case of fib(), we don't have a simple recursion, we have a conditional recursion. The default behavior for the Duby "Simple" typer is to assume that if one branch of an If can successfully resolve, that's the type to use (temporarily) as the value of the If (while still marking the if as unresolved, and unifying the two bodies later). And since the root of the fib() method only contains an If, it can successfully resolve. Let's try it:
~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rb
Could not infer typing for the following nodes:
Call(<) (child of Condition)
Condition (child of If)
Call(-) (child of FunctionalCall(fib))
FunctionalCall(fib) (child of Call(+))
Call(-) (child of FunctionalCall(fib))
FunctionalCall(fib) (child of Call(+))
Call(+) (child of If)
...
Ouch, what happened here? Actually it's pretty easy to understand...the calls to "<", "-", and "+" were unknown to the Typer, and so they could not be resolved. As a result, the If Condition could not resolve, nor could the body of the Else statement. This is not necessarily a fatal state, merely an incomplete one. The "resolve" method on Typer can be called with "true" to force an error to be raised, or with no arguments to just "do the best it can do". In this case, using the simple call to the typer, it raises and displays the error, but there's no reason that more information couldn't be added to the system to allow a subsequent resolution to proceed.

Pluggable Inference

This is where having a pluggable engine starts to come in handy. Though the mechanism is currently pretty crude, there's already a basic ability to specify a plugin for method resolution. In order to enlist in method resolution, a plugin must define a "method_type" method that accepts a parent Typer, a target type, a method name, and a list of parameter types. If at any time method resolution fails in the Simple Typer, the plugin Typers will be called in turn. So in this case, I created a simple "Math" typer that's aware of a few simple operations with LHS and RHS of :fixnum. Let's try it:
~/NetBeansProjects/jruby ➔ jruby -rcompiler/duby/plugin/math lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rb

AST:
Script
MethodDefinition(fib)
{:return=>Type(fixnum), :n=>Type(fixnum)}
Arguments
RequiredArgument(n)
Body
Noop
If
Condition
Call(<)
Local(name = n, scope = MethodDefinition(fib))
Fixnum(2)
Local(name = n, scope = MethodDefinition(fib))
Call(+)
FunctionalCall(fib)
Call(-)
Local(name = n, scope = MethodDefinition(fib))
Fixnum(1)
FunctionalCall(fib)
Call(-)
Local(name = n, scope = MethodDefinition(fib))
Fixnum(2)
Notice now that the return type of fib() has been correctly inferred to be :fixnum. Huzzah! We are successful!

Debugging

Along with work to make the system more pluggable and the code easier to follow, I've also been trying to provide useful debugging output. Man, I think making debugging output useful and readable is harder than writing a type inference engine in the first place. I must have spent a good hour just tweaking output so it didn't look totally heinous. And it's still not great, but it's definitely usable. Here, for your edification, is the debugging output from type inference on fib.rb:
* [Simple] Learned local type under MethodDefinition(fib) : n = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "<" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "<" Type(fixnum) on Type(fixnum) = Type(boolean)
* [AST] [Call] resolved!
* [AST] [Condition] resolved!
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Method type for "+" on not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "+" on not found
* [Simple] Deferring inference for Call(+)
* [Simple] Deferring inference for If
* [Simple] Learned method fib (Type(fixnum)) on Type(script) = Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "+" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "+" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] [Cycle 0]: Inferred type for Call(+): Type(fixnum)
* [AST] [If] resolved!
* [Simple] [Cycle 0]: Inferred type for If: Type(fixnum)
* [Simple] Inference cycle 0 resolved all types, exiting
What's Next

I should clarify a few things before getting back to work:

This codebase is mostly separate from, but heavily advised by the original Duby prototype. I learned a lot from that code, and it's still more functional from a compilation standpoint, but it's not really something I can evolve. The new codebase will probably be at the same level of functionality with a week or so.

Largely the more measured pace of this new codebase is because of two key goals.

I'd like to move the system ever toward full, general type inference when possible. That includes inferring method parameters as well as return types. That also means there will have to be a much more comprehensive type-inference engine than the current "simple" engine, but nothing about the current system will break forward compatibility with such work.

I'd also like to see the entire type inferencing pipeline and ideally most of the compilation pipeline entirely platform-independent. There has been a surprising amount of interest in Duby from folks desiring to target LLVM, C, x86 ASM, Parrot, and others. Sadly, without a realistic codebase to work from--one which isolates typing logic from the underlying platform--none of that work has moved forward (though it's almost frightening how quickly people pounced on the idea). So the new system is almost entirely independent of Java, the JVM, and JRuby. Ideally the only pieces you'd need to reimplement (or plugin) for a given platform would be the parser (producing a Duby AST through whatever mechanism you choose) and the type mapper/compiler for a given system. The parser would probably be the more difficult, but since the language syntax is "basically Ruby" and designed to support full AOT compilation you can use any Ruby parser to produce a Ruby AST and then transform it in the same way I transform JRuby's AST. The compiler will mostly be a matter of mapping Duby syntactic structures and inferred generic types to code sequences and native types on your platform of choice.

Over the weekend I'll probably try to absorb the available literature on type inference, to learn what I'm doing wrong and what I could be doing better...but I think my "common sense" approach seems to be working out pretty well so far. We shall see. Suggestions for papers, ideally papers designed for mortals, are welcome.

So there you have it...the Duby update several of you have been asking for. Satisfied for now? :)

(BTW, the code is all available in the JRuby repository, under lib/ruby/site_ruby/1.8/compiler/duby. The new stuff is largely under the ast/ subdir and in the "transform.rb" and "typer2.rb" files. The tests of interest are in test/compiler/duby/test_ast.rb and test_typer.rb)