Tuesday, June 27, 2006

Ruby.NET Compiler Passes test.rb

Queensland University of Technology's Ruby.NET project has achieved a major milestone: their Ruby to .NET compiler is able to run and pass all the test.rb samples in the C Ruby distribution. They've got plenty more work to do to be "Ruby compatible" in any sense of the words, but they're making very good progress.

I'm going to stay involved with this project to help in any way I can, be it discussions on implementation challenges or helping to debug and write tests. The .NET CLR is an impressive piece of technology, and I'd certainly like to see Ruby gain traction in the .NET world as well.

At any rate, kudos to the QUT folks for their hard work. I'm also glad to see they have a very liberal license and all source available online. Once I confirm I won't be tainted in any way I'll be reviewing their code and compiler...I'll report back on what I find.

It's amazing what financial backing from a major software company can do for an open-source project :)

Thursday, June 15, 2006

Mongrel in JRuby?

I'm going to try to post more frequent, less-verbose entries to break up the time between longer articles. This is the first.

For those of you unfamiliar with it, Mongrel is a (mostly) Ruby HTTP 1.1 server, designed to escape the perils of cgi/fcgi/scgi when running web frameworks like Rails or Camping. From what I hear it's very fast and very promising. Think of it as a Ruby equivalent to Java's servlet/web containers like Jetty and Tomcat.

Deploying Rails apps under JRuby currently has only two reasonable outlets:

- You can wrap all the Rails-dispatching logic into a servlet, deploying as you would a web application.
- You can (mostly) run the "server" script, which starts up Rails in WEBrick, Ruby's builtin lightweight server framework.

The first option would obviously provide the best integration with existing Java code and infrastructure. The second is good for development-time testing.

However, with JRuby rapidly becoming production-usable, there will be many folks who want a third option: deploying Rails--just Rails--in a production environment without a meaty servlet engine. For Ruby, that's where Mongrel comes in.

Mongrel is by the developer's own admission "mostly" Ruby code. The one big area where it is not Ruby is in its HTTP parser library (there's also a native ternary search tree, but that's no big deal to reimplement). Using Mongrel, require 'http11' pulls in a C-based HTTP 1.1 parser and related utilities. Ruby is notoriously slow for parsing work, so this native code is not unreasonable. However, it would be a barrier to running Mongrel within JRuby; we would need our own native implementation of the http11 library.

So, any takers? If I had a nickle for every JRuby sub-project I want to work on I'd have a fistful of nickles. This one will probably be pretty far back in the queue.

Known possibilities for lightweight HTTP 1.1 support (really all that's needed for Mongrel is an HTTP1.1 library, but that can probably be used alone):

- Jetty
- Simple

For those of you that say "Why not just wire Rails into [Tomcat|Jetty|Simple] and be done with it" I have this answer: Rubyists are not particularly fond of Java libraries. My aim of late is working toward supporting both Ruby/Java folks who will happily marry the two and pure Ruby folks that simply want another runtime to use. Mongrel is bound to become a very popular choice for web serving Ruby applications, and we would be remiss if we did not attempt to support it.

Wednesday, June 14, 2006

Unicode in Ruby, Unicode in JRuby?

The great unicode debate has started up again on the ruby-talk mailing list, and for once I'm not actively avoiding it. We have been asked many times by Java folks why JRuby doesn't suport unicode, and there's really been no good answer other than "that's the way Ruby does it".

For the record, Ruby does support utf8, but not multibyte. Internally, it usually assumes strings are byte vectors, though there are libraries and tricks you can usually use to make things work. The array of libraries and tricks, however, is baffling to me.

So here, now, I open up this question to the Java, JRuby, and Ruby communities in general as I did on the ruby-talk and rails-core mailing lists:

Every time these unicode discussions come up my head spins like a top. You should see it.

We JRubyists have headaches from the unicode question too. Since JRuby is currently 1.8-compatible, we do not have what most call *native* unicode support. This is primarily because we do not wish to create an incompatible version of Ruby or build in support for unicode now that would conflict with Ruby 2.0 in the future. It is, however, embarrassing to say that although we run on top of Java, which has arguably pretty good unicode support, we don't support unicode. Perhaps you can see our conundrum.

I am no unicode expert. I know that Java uses UTF16 strings internally, converted to/from the current platform's encoding of choice by default. It also supports converting those UTF16 strings into just about every encoding out there, just by telling it to do so. Java supports the Unicode specification version 3.0. So Unicode is not a problem for Java.

We would love to be able to support unicode in JRuby, but there's always that nagging question of what it should look like and what would mesh well with the Ruby community at large. With the underlying platform already rich with unicode support, it would not take much effort to modify JRuby. So then there's a simple question:

What form would you, the Ruby [or JRuby or Java] users, want unicode to take? Is there a specific library that you feel encompasses a reasonable implementation of unicode support, e.g. icu4r? Should the support be transparent, e.g. no longer treat or assume strings are byte vectors? JRuby, because we use Java's String, is already using UTF16 strings exclusively...however there's no way to get at them through core Ruby APIs. What would be the most comfortable way to support unicode now, considering where Ruby may go in the future?


So there it is, blogosphere. Unicode support is all but certain for JRuby; its usefulness as a JVM language depends on it. What should that support look like?

Saturday, June 10, 2006

Bringing RubyGems to JRuby OR The Zen of Slow-running Code

JRuby now supports RubyGems, and gems install correctly from local and remote. That's a huge achievement, especially considering the extra work that was required around YAML to get to this point. However, I'll start off with the caveats this time.

JRuby is very slow to install gems. You'll see what I mean in a moment, but it's so slow that something's obviously broken. That's perhaps the good news. Even on a high-end box, it's so intolerably slow that there's got to be a key fault keeping the speed down. We believe there are a couple reasons for it.

headius@opteron:~/rubygems-0.8.11$ time jruby gem install rails --include-dependencies
Attempting local installation of 'rails'
Local gem file not found: rails*.gem
Attempting remote installation of 'rails'
Updating Gem source index for: http://gems.rubyforge.org
Successfully installed rails-1.1.2
Successfully installed rake-0.7.1
Successfully installed activesupport-1.3.1
Successfully installed activerecord-1.14.2
Successfully installed actionpack-1.12.1
Successfully installed actionmailer-1.2.1
Successfully installed actionwebservice-1.1.2
Installing RDoc documentation for rake-0.7.1...
Installing RDoc documentation for activesupport-1.3.1...
Installing RDoc documentation for activerecord-1.14.2...
Installing RDoc documentation for actionpack-1.12.1...
Installing RDoc documentation for actionmailer-1.2.1...
Installing RDoc documentation for actionwebservice-1.1.2...

real 63m16.575s
user 55m5.939s
sys 0m25.547s


Ruby in Ruby

I'll tackle the simpler reason first: we still have a number of libraries implemented in Ruby code.

At various times, the following libraries have all been implemented in Ruby code within JRuby: zlib, yaml, stringio, strscan, socket...and others irrelevant to this discussion. This provided us a much faster way to implement those libraries, but owing to the sluggishness of JRuby's interpreter, this also meant these libraries were slower than we would like. This is actually no different from C Ruby; many of these intensive libraries are implemented in C code in Ruby, with no Ruby code to be seen.

Some, such as zlib, yaml, and stringio are on their way to becoming 100% Java implementations, but they're not all the way there yet. This is generally because Ruby code is so much simpler and shorter than Java code; completing the conversion to Java is painful in many ways.

Ola's work on the zlib and yaml libraries have been a tremendous help. He provided the first ruby implementation of zlib, and has provided incremental improvements to it, generally by sliding it closer and closer to 100% java. Ola also ported a fast YAML parser from Python, first to Ruby and now increasingly to JRuby, resulting in his RbYAML and JvYAML projects. Our old, original Ruby 1.6 yaml.rb parser was extremely slow. The new parsers have made YAML parsing speed many orders of magnitude faster. Tom and Ola have both worked to improve stringio. stringio provides an IO-like interface into a string, much like Java's StringBuffer/StringBuilder classes. In Ruby, understandably, this is implemented entirely in C. Our version, while slowly becoming 100% Java, is still quite a bit slower than it ought to be.

The continued porting of these hot-spot libraries from Ruby to Java will have perhaps the largest effect on gem install performance. However, there's another cause for alarm.

Fun with Threads

Threading in Ruby has a somewhat different feel from threading on most other languages and platforms. Ruby's threads are so easy to use and so lightweight that calling them "threads" is a bit misleading. They can be stopped, killed, and terminated in a variety of ways from outside themselves. They are trivial to launch: Thread.new { do something }. C Ruby also implements them as green threads, so no matter how many threads you spawn in Ruby, you're looking at a single processor thread to execute them. That means considerably less overhead, but practically no multi-core or multi-threading scalability at all. In short, Ruby's threads allow you to use and manipulate them in ways no other platform or language's threads allow while simultaneously giving you only a subset of typical threading benefits. For sake of brevity, I will refer to them as rthreads for the rest of this article.

The increased flexibility of rthreads mean that kicking off an asynchronous job is trivial. You can spin off many more threads than could be expected from native threading, using them for all manner of tasks where a parallel or asynchronous job is useful. The rthread is perhaps more friendly to users of the language than native threads: most of the typical benefits of threading are there without many of the gotchas. Because of this, I have always expected that Ruby code would use rthreading in ways that would horrify those of us with pure native threading. Therefore, I decided during my early redesign that supporting green threading--and even better, m:n threading--should be a priority. Our research into why gems are slow seems to have confirmed this is the right path.

RubyGems makes heavy use of the net/http package in Ruby. It provides a reasonably simple interface to connect, download, and manipulate http requests and responses. However, it shows its age in a few ways; other implementations of client-side http are around, and there are occasional calls to replace net/http as the standard.

net/http makes heavy use of net/protocol, a protocol-agnostic library for managing sockets and socket IO. It has various utilities for buffered IO and the like. It also makes use of Ruby's "timeout" library.

The timeout library allows you to specify that a given block of code should only execute for a given time. As you might guess, this requires the use of threading. However, you might be surprised how it works:


from lib/ruby/1.8/timeout.rb
  def timeout(sec, exception=Error)
return yield if sec == nil or sec.zero?
raise ThreadError, "timeout within critical session" if Thread.critical
begin
x = Thread.current
y = Thread.start {
sleep sec
x.raise exception, "execution expired" if x.alive?
}
yield sec
# return true
ensure
y.kill if y and y.alive?
end
end


It's fairly straightforward code. You provide a block and an optional timeout period. If you specify no timeout, just execute the block. If we're in a critical section (which prevents more than one thread from running), throw an error. Otherwise, start up a thread that sleeps for the timeout duration and execute the block. If the timeout thread wakes up before the block is complete, interrupt the working thread. Otherwise, kill the timeout thread and return.

With rthreads, this is a fairly trivial operation. It gives Ruby's thread scheduler one extra task...starting up a lightweight thread and immediately putting it to sleep. Now it can be argued that this is a waste of resources, creating a thread every time you want to timeout a task. I would agree, since a single thread-local "timeout worker" would suffice, and would not require launching many threads. However, this sort of pattern is not unexpected with such a simple and consumable threading API. Unfortunately, it's a larger problem under JRuby.

JRuby is still 1:1 rthread:native thread, which means the timeout code above launches a native thread for every timeout call. Obviously this is less than ideal. It becomes even less ideal when you examine more closely how timeout is used in net/protocol:

from lib/ruby/1.8/net/protocol
    def read(len, dest = '', ignore_eof = false)
LOG "reading #{len} bytes..."
read_bytes = 0
begin
while read_bytes + @rbuf.size < len
dest << (s = rbuf_consume(@rbuf.size))
read_bytes += s.size
rbuf_fill
end
dest << (s = rbuf_consume(len - read_bytes))
read_bytes += s.size
rescue EOFError
raise unless ignore_eof
end
LOG "read #{read_bytes} bytes"
dest
end
...
def rbuf_fill
timeout(@read_timeout) {
@rbuf << @io.sysread(1024)
}
end


For those of you not as familiar with Ruby code, let me translate. The read operation performs a buffer IO read, reading bytes into a buffer until the requested quantity can be returned. To do this, it calls rbuf_fill repeatedly to fill the buffer. rbuf_fill, in order to enforce a protocol timeout, uses the timeout method for each read of 1024 bytes from the stream.

Here's where my defense of Ruby ends. Let's dissect this a bit.

First off, 1024 is nowhere near large enough. If I want to do a buffered read of a larger file (like oh, say, a gem) I will end up reading it in 1024-byte chunks. For a large file, that's hundreds or potentially thousands of read calls. What exactly is the purpose of buffering at this point?

Second, because of the timeout, I am now spawning a thread--however green--for every 1024 bytes coming out off the stream. Because of the inefficiency of net/protocol and timeout, we have a substantial waste of time and resources.

Now translate that to JRuby. Much of JRuby is still implemented in Ruby, which means that some calls which are native in Ruby are much slower in JRuby today. Socket IO is in that category, so doing a read every 1024 bytes greatly increases the overhead of installing a gem. Perhaps worse, JRuby implements rthreads with native threads, resulting in a native thread spinning up for every 1024 bytes read. For a 500k file, that means we're reading 500 times and launching 500 timeout threads in the process. Not exactly efficient.

We will likely try to submit a better timeout implementation, or a protocol implementation that reads in larger chunks (say 8k or 16k), but we have learned a valuable lesson here: rthreads allow for and sometimes make far easier threading scenarios we never would have attempted with native threads. For that reason, and because we'll certainly see this in other libraries and applications, we will continue down the m:n path.

Coolness Still Abounds

As always, despite these obstacles and landmines, we have arrived at a huge milestone in JRuby's development. RubyGems and Ruby go hand-in-hand like Java and jarfiles. The ability to install gems is perhaps the first step toward a really usable general-purpose Ruby implementation. Look for a release of JRuby--with a full complement of Ruby libraries and RubyGems preinstalled--sometime in the next week or two.

Friday, June 09, 2006

Preparing for 0.9.0

Yes, I had intended to post at least once a week. Things have been moving along pretty fast with JRuby since JavaOne. Here's a quick update...I'll post a more detailed one soon.

  • RubyGems now installs perfectly, and works correctly. This is due in large part to Ola Bini's work on RbYAML and JvYAML, Ruby and Java YAML parsers which we are integrating into JRuby.
  • Gems install correctly from local files or from the network. They're a little slow, due to some bottlenecks we're ironing out, but they work. Ola's post shows it in action.
  • The "rails" script for generating a base application works. This is the first step in Rails development, and it ended up just kinda working without any additional effort.
  • Tom and I have been working on performance, putting in 5-10% speedup fixes here and there. There's a lot more to do, but we're making great progress.
  • We have gotten approval from matz to include the full complement of Ruby's .rb libraries in the next release of JRuby. This means all you will need for a working JRuby install is the release archive. We will also likely pre-install rubygems, so we're on par with a typical Ruby install in that regard.
  • Due to networking and IO fixes from Evan Buswell (especially his work to make 'select' function) we are very close to running Webrick in JRuby. Along with this would come the "server" script in Rails, which allows you a simple development deployment for Rails development and testing.
  • We will be making the ActiveRecord JDBC connector available to the public concurrent with this release. It is unknown yet whether it will be a gem, a Rails plugin, or whether we can get the Rails guys to include it in the main release.
  • A number of people are interested in beginning work to make Rake work well in JRuby for doing Java builds. I am one of those people, and time permitting I will start trying to make this happen.
  • I am also interested in implementing the win32ole library in JRuby. There are various plugins for Java that allow calling OLE components, and it should not be difficult to wrap one of those libraries. This would allow things like WATIR to work seamlessly in JRuby.

And here's a little funny that Tom just pointed out: Ed Burnette's JRuby Photo

Tom is on the left, looking into the future, and I'm on the right, staring down the camera.