Why Rubinius Matters to Ruby's Future

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Friday, December 28, 2007

I am a long-time fan of self-hosted languages. In that post I listed the reasons I thought that a language should be mostly or entirely written in itself. Here’s another reason writing a language in itself is important: If a language’s core libraries and frameworks are written in that language, it is possible for every programmer to improve on them.

Ruby’s core libraries are written in C. Here’s the source for Ruby’s collect method:


/*
 *  call-seq:
 *     array.collect {|item| block }  -> anarray
 *     array.map     {|item| block }  -> anarray
 * 

 *  Invokes block once for each element of self. Creates a 
 *  new array containing the values returned by the block.
 *  See also Enumerable#collect.
 * 

 *     a = [ “a”, “b”, “c”, “d” ]
 *     a.collect {|x| x + “!” }   #=> [“a!”, “b!”, “c!”, “d!”]
 *     a                          #=> [“a”, “b”, “c”, “d”]
 */
static VALUE
rb_ary_collect(ary)
    VALUE ary;
{
    long i;
    VALUE collect;

    if (!rb_block_given_p()) {
    return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);
    }

    collect = rb_ary_new2(RARRAY(ary)->len);
    for (i = 0; i < RARRAY(ary)->len; i++) {
    rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr[i]));
    }
    return collect;
}

Perhaps you like working with Haskell-style fold and unfold rather than the Smalltalk-style collect, select, and detect. No problem, you can hack your own in Ruby, like this:


class Object
  def unfold options = {}, &incrementor
    return [] unless options[:while].nil? || options[:while].to_proc.call(self)
    transformed = options[:map] && options[:map].to_proc[self] || self
    return [transformed] if options[:to] && options[:to].to_proc.call(self)
    incrementor.call(self).unfold(options, &incrementor).unshift(transformed)
  end
end

One hitch: your fold and unfold are hundreds of times slower than Ruby’s built-in-C methods and classes. It reminds me of Newton programming. Apple gave us a really cool language—NewtonScript—for writing applications. Except, the built-in applications were written in C, and the C compiler was only for Apple engineers.

The good news about Ruby is that you can write your own classes in C if you want to. But that is a significant barrier to entry for many programmers, shrinking the available pool of programmers who will enhance the language.

Having core libraries in C is a great choice for implementing a language that is to be used for other things like building web applications. But it is not a great choice for a language that is to be used to build other languages. And “building other languages” is exactly what Bottom-Up Programming or Meta-Linguistic Abstractions are all about. In other words, writing the core libraries in C is not a great choice for a language where programmers write their own abstractions.

Now for many things, the speed penalty of writing your own abstractions in Ruby is negligible. But not everything. So there is always going to be this class of things—and I think collection manipulation is one of those things—where you need to be able to write stuff that is as good as what comes out-of-the-box.

If new stuff is an order of magnitude slower, you might be able to use it for non-critical things, but your chances of persuading anyone else to use it are very low. Which means that the language as a whole progresses slowly because real progress can only happen in areas where performance doesn’t matter. Like database-bound web applications.

Having an implementation where the built-in stuff is on the same footing as your stuff opens up the doors for actual progress. It forces the language itself to be Good Enough, and it makes it possible for every Ruby programmer to improve the language.

what we can learn from java, whoops smalltalk

Java has this incredibly powerful and popular IDE, Eclipse. It is so powerful that many people feel it is impossible to write production Java code without it. Why is it so powerful?

One of the major reasons it is both powerful and popular is the availability of plug ins. It seems to support the language, UML diagrams, source code control, and everything up to (I’m pretty sure) time tracking for client billing. Naturally, the plug ins are written in Java, just like the almost all of the built-in functionality.

Clara Creative can write her own plug in and it won’t be a second-class citizen. And since it is a tool for Java programmers, Clara Creative already knows how to write plug ins, she doesn’t need to drop into another language.

Wow, that is neat. And it does help explain why Eclipse has so many plug ins, and why they are popular: there is no low-level language barrier, and they all on an equal footing with each other.

This shouldn’t surprise you. Eclipse evolved from IBM VisualAge Micro, which was written by Smalltalk programmers in Smalltalk. And of course, Smalltalk is a language where almost everything is written in Smalltalk itself. Smalltalk programmers expect to be able to extend the language and environment without penalty.

In the end, the choice of whether to implement core features in C or Ruby will always be difficult. The temptation to optimize for speed will always be strong, especially when the language is fighting for mind share. But extensibility and variety is also a win, and Ruby fights with its libraries and features as much as with its performance.

Perhaps we won’t all be using fold and unfold instead of collect, select, and detect. But if we aren’t, it ought to be because we prefer the originals, not because the replacements are crippled in comparison, or because the kind of person who likes inventing new tools prefers to write them in Ruby instead of in C.

I’m looking forward to hearing more from the Rubinius team. I really think they hold the key to the future. Thanks, Ezra, for your comment.

Labels: ruby

¶ 5:05 PM

Comments on “Why Rubinius Matters to Ruby's Future”:

Actually, Rubinius isn't the only project working to rewrite the core Ruby libs in pure Ruby. The JRuby project for one has reimplemented most of the core. There are a few others too, but I can't remember their names off the top of my head.

A point of mild interest is that Ruby scripts running on JRuby run just as fast (if not faster) than Ruby 1.8, despite the fact that JRuby's core libs are implemented in Ruby rather than Java.

# posted by

Daniel Spiewak : 7:03 PM

Actually Jruby doesn't implement the core libraries in ruby, it implements them in java.

Rubinius implements core classes like Array, String and Hash in ruby, turtles all the way down so to speak. The other implementations like Jruby and IronRuby implement the core classes in java or c# instead of C like the current 1.8.x branch of MRI. So they are just as opaque as C ruby.

Rubinius is important because it is the only alternate implementation that will be compatible with standard MRI C extensions and is not tied to java or .NET for it's runtime. I think it has the best chance of becoming the de facto ruby implementation out of any of the alternate.

In fact that's why we(engineyard.com) have hired a full time staff of 5 people to work solely on Rubinius. Expect a lot of progress over the next few months.

# posted by

Ezra Zygmuntowicz : 7:28 PM

I think it is possible to have a quite reflexive language only if you do your homeworks with the GC and interpretation stuff. For this you need time and money (basically to buy neurons). Java was very slow at the beginning.

I'm far from being an "forgiver" with things like: "not everybody needs speed". I want as much speed as possible out-of-the-box, because I wish I won't have slowdown problems. Moreover, I cannot know *when I write* if I'm in a speed-critical section or not. And the big thing is you don't know your load before putting your app online (if you're in innovation).

I'm not speaking about premature optimization (my work), I'm speaking about basic infrastructure inherent speed (my provider's work).

# posted by

nraynaud : 8:35 PM

I'd like to leave my opinion on Rubinius, however short. :-)

Java and .NET can be quite complex beasts, with several layers of abstraction in their core. On top of that, they have some smart optimizations for their core needs. And on top of that, they have their core design goals which can be pretty serious, like O.S. systems programming. And on top of that, they fight each other so we can comfortably say that they are two different worlds.

That is, trying to use a higher level language on top of either of them can be a daunting task, but that's not deterrent in and of itself. If there's a case for higher level languages, the demand will be taken care of one way or another.

Rubinius on the other hand has its own core design goals, which can be sort of selfish indeed. It wants to tackle Ruby first and foremost, that is, it starts with a high level language already, which can be unsuitable to extending code O.S. tasks like Java and .NET theoretically could. Where I think Rubinius will be a little "too" selfish is that it will find most of its end-user use cases by powering server-side applications in a first moment. But at the front-end we have browsers, Flash/Flex/AIR, Silverlight/WPF, CLI, GTK+/QT, so it's not a problem necessarily.

Rubinius while a research project has an achievable goal in practice.

I guess one way to "beat the Ruby language" is to get to know it further and to expand on it should it be possible. Rubinius also motivates the other Ruby implementers, be it in MRI, YARV, JRuby, IronRuby... Rubinius employs Ruby hackers too! Those guys help with Ruby libraries and implementations!

Who knows how Rubinius could end up being used 10 years from now? Ruby has brought as this far, but where is it going to take us? Could Google embed Rubinius in their Android platform? Could Adobe embed it in their future Flash enhancements once "broadband" becomes more common-place?

It's all exciting. Hopefully the old-guard will wake up to the opportunities that lie ahead.

# posted by

Joao Pedrosa : 8:51 PM

Joao Pedrosa - I think the more likely scenario is that Rubinius evolves into two/three (partially-)separate, valuable projects:
* a VM
* ruby core classes, written in ruby
* specs for the ruby language

The specs are extremely valuable to the ruby community, which until this point has just had MRI as the spec

The core classes written in ruby may also be valuable to other projects, for instance I believe the JRuby team is at least considering trying to swap out some Ruby-in-Java classes with the Rubinius classes

Lastly the VM - and I have a pretty shaky understanding of the details - has the potential to be the best way of running Ruby for those without a need for Java(JRuby) or .NET(if IronRuby can deliver) integration

Really the most exciting thing about Rubinius to me is the community involvement and cooperation: EngineYard sponsoring Rubinius, Sun chipping in some too, the JRuby/Rubinius teams working together on the same specs and code, etc. It's making Ruby look much more viable long-term as language and platform than when it was just Matz and a couple guys seeming like the "man behind the curtain" of Ruby/MRI

BTW, here's Enumerable#collect from Rubinius source

# posted by

crayz : 10:24 PM

The biggest factor behind Rubinius, and I say this without a lot of experience and 0 contribution (just been on their IRC channel for some time) is IMHO the community drive.

So I think this is really the key difference - the community.

# posted by

Anonymous : 3:32 PM

The elephant in the room is (and has been for quite some time), why anybody in their right mind would consider writing domain specific languages in a language as slow as Ruby. You want to write a DSL with a scripting language: go with the fastest scripting language, Lua. You need a scripting language with OO and other "batteries included": go with Ruby (or Python). No matter how much people nod their heads in agreement with "the right tool for the job", they still jump on the bandwagon and proclaim some new language as the silver bullet.

# posted by

George Jempty : 11:37 PM

<< Home