raganwald
Monday, January 28, 2008
  ruby.reddit.com
Reddit is now experimenting with user-generated reddits, and one of the new ones is reddit for ruby hackers. For the three people who have never heard of reddit, this is a link aggregation site where you can vote on ruby articles, something like Digg and Dzone.



The Ruby reddit is important, and if you patronize link aggregators like this, I encourage you to please post good Ruby links, vote for the links you like, and perhaps even participate in the comments.

There is already a programming subreddit, but the Ruby reddit is appears to be an entirely distinct “space.” Posts can be in programming.reddit com and in ruby.reddit.com, and there are separate posts and comments. What this means is that Ruby programmers can vote for links and exchange comments without getting into tiresome flame-wars with people who do not care for Ruby.

Of course, you are still free to defend Ruby against Smug Haskell Weenies over in programming.reddit.com. It isn’t an all-or-nothing choice. I look forward to seeing you there!

Labels:

 

Thursday, January 24, 2008
  Object#andand & Object#me in Ruby
At Mobile Commons, we have to write a lot of code under time constraints. We also work on a fairly distributed basis, which means that writing readable code is not a luxury, it’s a necessity. We can’t afford to be clever for clever’s sake, but if a new idiom helps make our code easier for our colleagues to grasp in a hurry, we embrace it. Here’re some idioms we’ve recently created:

Object#andand

Ruby programmers are familiar with the two guarded assignment operators &&= and ||=. The typical use for them is when you have a variable that might be nil. For example:

@first_name &&= @first_name.trim
@phone ||= '612-777-9311'

You are trimming the first name provided it isn’t nil, and you are assigning ‘612-777-9311’ to the phone if it is nil (or false, but that isn’t important right now). The other day we were discussing the guards and we agreed that we wished there was a guarded method invocation operator. Here’s an example of when you would use it:

@phone = Location.find(:first, ...elided... )&&.phone

Meaning, search the location table for the first record matching some criteria, and if you find a location, get its phone. If you don’t, get nil. (Groovy provides this exact functionality, although Groovy uses ?. instead of &&.) However, &&. won’t work because &&. is not a real Ruby operator. So what do we write instead?
A feature is “powerful” when at least one of the following holds:
  1. It can be used to implement something trivial in an pointlessly complicated way.
  2. It can cause a lot of damage.
Seriously, it seems like 85% percents of the contexts where something is called “powerful,” it really means “useless and dangerous.”
—Yossi Kreinin, Fun at the Turing Tar Pit

How about:

@phone = (loc = Location.find(:first, ...elided... )) && loc.phone

Note that we need a local variable to make it work without extending the Object class. Note further that although we only need the local variable for one line of code, it is in method scope and we must be careful that we don’t clobber an existing loc variable. And when reading this code, we now have to look and see whether loc is used again to understand what this line does.

This is a scoping problem. if you are going to use variables, you want to restrict their scope as much as possible. It feels like half of what we do when programming is manage side effects like overwriting a variable you are using elsewhere, and the more you can restrict those side effects, the easier programming becomes. Java supports block scoping natively, and languages like Scheme solve this by providing a block scope macro, let. And you can roll your own in languages like Javascript that provide anonymous functions. But this seems to get the natural left-to-right order wrong:

@phone = lambda { |loc| loc && loc.phone }.call(Location.find(:first, ...elided... ))

Well, we are tinkerers and toolsmiths. So let’s roll our own. We can’t use &&. without rewriting the Ruby parser, and that seems drastic. So let’s use “andand” in the hope that our non-English-speaking colleagues will forgive us. We want to write:

@phone = Location.find(:first, ...elided... ).andand.phone

And get the same effect as @phone = ->(loc){ loc && loc.phone }.call(Location.find(:first, ...elided... )). This would be bloody trivial if we have a macro facility, but “If wishes were horses then beggars would ride.” Instead, we need to write a method that does some conditional evaluation for us. If the receiver (in this case the result of Location.find(:first, ...elided... )) is truthy, like an ActiveRecord model, we want andand to return it so we can use it. But if it is falsy, we want andand to somehow return something that takes any message you send it and returns itself without doing anything. In other words, we want a mock object of sorts. So that’s what we’ll do:

class Object
def andand (p = nil)
self or MockReturningMe.new(self)
end
end

class MockReturningMe
instance_methods.reject { |m| m =~ /^__/ }.each { |m| undef_method m }
def initialize(me)
@me = me
end
def method_missing(*args)
@me
end
end

Actually, there is more to it than this, for example the MockReturningMe class is enclosed in a module to minimize namespace pollution. Consult the source and RSpec (links below) for the latest particulars.

Note that because you accept any method using Ruby’s method invocation syntax, you can accept methods with parameters and/or blocks:

list_of_lists.detect { ...elided... }.andand.inject(42) { ...elided ... }

Some of the other solutions to this problem that use send change Ruby’s syntax. This solution emphasizes syntactic regularity: the goal was to make an “&&.” operation that worked like &&=. &&= looks just like normal assignment, you can use any expression on the RHS, only the semantics are different. The andand method also works just like a normal method invocation, only the semantics are modified.

Enhanced Object#tap

(Formerly called Object#me)

Ruby 1.9 includes Object#tap:

Location.find(:first, ...elided... ) do |location|
puts location.inspect
end
=> location

Ruby On Rails includes a similar feature, the returning idiom:

returning(Location.find(:first, ...elided... )) do |location|
...elided...
end
=> location

Both return the location, not whatever happens inside the block. This is very useful, especially when chaining methods in a pipeline. Sometimes you want to do something for side effects, but continue on with the chain. For example:

returning([1, 2, 3, 4, 5]) do |arr|
arr.pop
end.map { |n| n * 2 }
=> [2, 4, 6, 8]

There are going to be times that using a method instead of a function makes the code more consistent. So let’s imagine what that would look like:

Location.find(:first, ...elided... ).tap do |location|
...elided...
end…

This is much nicer when you are “injecting” some extra code into the middle of an expression, such as when adding some poor-man’s-debugging puts statements.

Would we ever want to go beyond a block and write Location.find(:first, ...elided... ).tap.some_method instead of Location.find(:first, ...elided... ).tap do ...elided... end? Of course we would! Quite often, we want to send several methods to the same receiver. In Smalltalk, the semicolon does this explicitly. In Ruby, we have to be very careful to make sure our methods return self to enable chaining. For example:

[1, 2, 3] << 4 << 5
=> [1, 2, 3, 4, 5]

But if a method doesn’t return the receiver, chaining doesn’t work. That’s why we had to use “returning” in the array example above:

[1, 2, 3, 4, 5].pop.map { |n| n * 2 }
=> NoMethodError: undefined method `map' for 5:Fixnum

That’s because [1, 2, 3, 4, 5].pop => 5, not [1, 2, 3, 4]. So instead, we write:

[1, 2, 3, 4, 5].tap.pop.map { |n| n * 2 }
=> [2, 4, 6, 8] # or we write...
[1, 2, 3, 4, 5].tap { |arr| arr.pop }.map { |n| n * 2 }
=> [2, 4, 6, 8] # both work

When you examine the source, you’ll see that tap looks a lot like andand. No surprise, we are doing similar things with slightly different semantics.

This implementation of Object#tap has two advantages over the implementation in Ruby 1.9: first, it works in Ruby 1.8. Second, it adds the ability to call another method (like pop) and not have to include a block.

Summary: andand & enhanced tap

andand and tap both provide the same two benefits: First, they either eliminate (in the case of a method call) or limit in scope (in the case of a block) a local variable. Second, they let you maintain a natural left-to-right order when writing pipelined expressions. They are simple and obvious enough that I feel any unfamiliarity for the reader will be more than outweighed by the cleaner, easier-to-digest code that results from their liberal use.

My suggestion is that these two benefits make it worth your while to add these methods to the Object class and use them regularly in your code.

Installation

sudo gem install andand. For the source and more information, http://andand.rubyforge.org.


Similar solutions:
  1. Kernel#ergo in Ruby Facets
  2. send_with_default
  3. maybe
  4. _?
  5. if_not_nil (via François Beausoleil)
  6. Groovy’s Safe Navigation Operators via Call by Name: "Yo Elvis"
  7. try() and A better try



postscript: Inline Rescues

I have now seen two people asking about using an inline rescue instead of andand:

Location.find( …elided… ).phone rescue nil

Also, one of the alternate solutions seems to use this technique. Obviously, it works for many of the cases you want to try, and you don’t need to add a new method to Object. However, it is—to paraphrase Chalain—“Sneaking up on the Interpreter.”

It achieves what we want almost by accident, namely that it rescues NoMethodError, which by coïncidence gives us the result we want. However, it does not communicate our intent. It says that we want to transmute all exceptions into a value of nil. When someone glances at the code, will they think of NoMethodError? Or will they assume that we are trying to handle exceptions that the .phone method might throw?

And what if .phone actually throws something serious? In Rails, what happens if a migration is screwed up and there is no phone column? I think we actually want a NoMethodError in that case. The rescue clause will accidentally swallow the exceptions we are trying to catch.

I think if you are absolutely certain that you do not care about other exceptions and also if your team uses this idiom extensively so that it does not miscommunicate its intent, you can live a long and happy life using it. However, I would be hesitant to recommend it as a general-purpose solution. andand and some of the other solutions obviously require the code reader to learn a new method name, but after that they offer the exact functionality we need and clearly communicate their intent.

Labels:

 

Tuesday, January 22, 2008
  Ruby is soooooo 2002
I first heard about Ruby when Matz spoke at LL2 in November, 2002. I experimented with it on and off over the next few years, and I liked it. I also experimented with C#, Groovy, and a few others, but Ruby reminded me of Smalltalk, a language I used in the 80s and loved, so I stuck with it even though I seemed to be one of three people using Ruby without using Rails.

Lately, the internet echo chamber is souring on Ruby and moving from embracing it to hating it. I suspect that although people are quoting all sorts of technical reasons for their dislike, it comes down to cultural forces. Shrug.

I am reminded of the time I showed up in high school one September on a skateboard. To give you an idea of where skateboards were in those days, my model had clay wheels designed for wood-rink roller-skates, not the polyurethane we have today. Any ways, everyone laughed at me. The next Summer, everyone had one, and they were laughing at the fact that I didn't ride the latest model, even though I was a more accomplished free-styler and slalom rider. Then, the fad passed, and I was the iconoclast again, still riding my board while everyone embraced mopeds.

Guess what? There is no villain in the story. I was happy riding with my friends. The people who jumped on and off fads were happy, and they were learning important life skills about participating in human culture. And the fad was a big win: Early boards really weren’t very good, and the popularity brought money and manufacturing scale to skateboarding that improved the technology.

Paul Graham has said that the real world is nothing like high school. I’m not so sure.



I often use Ruby for code examples. What do you think: Should I continue using Ruby, or should I use another language for examples? If I continue using Ruby, should I try to show idiomatic Ruby or should I try to focus on general ideas that can be applied elsewhere?

Code examples should be:
Ruby-specific
Anything but Ruby
Ruby, but general
A mix of specific and general ideas
  
pollcode.com free polls

Labels:

 

Friday, December 28, 2007
  Why Rubinius Matters to Ruby's Future
I am a long-time fan of self-hosted languages. In that post I listed the reasons I thought that a language should be mostly or entirely written in itself. Here’s another reason writing a language in itself is important: If a language’s core libraries and frameworks are written in that language, it is possible for every programmer to improve on them.

Ruby’s core libraries are written in C. Here’s the source for Ruby’s collect method:

/*
* call-seq:
* array.collect {|item| block } -> anarray
* array.map {|item| block } -> an
array
*

* Invokes block once for each element of self. Creates a
* new array containing the values returned by the block.
* See also Enumerable#collect.
*

* a = [ “a”, “b”, “c”, “d” ]
* a.collect {|x| x + “!” } #=> [“a!”, “b!”, “c!”, “d!”]
* a #=> [“a”, “b”, “c”, “d”]
*/
static VALUE
rb_ary_collect(ary)
VALUE ary;
{
long i;
VALUE collect;

if (!rb_block_given_p()) {
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);
}

collect = rb_ary_new2(RARRAY(ary)->len);
for (i = 0; i < RARRAY(ary)->len; i++) {
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr[i]));
}
return collect;
}

Perhaps you like working with Haskell-style fold and unfold rather than the Smalltalk-style collect, select, and detect. No problem, you can hack your own in Ruby, like this:

class Object
def unfold options = {}, &incrementor
return [] unless options[:while].nil? || options[:while].to_proc.call(self)
transformed = options[:map] && options[:map].to_proc[self] || self
return [transformed] if options[:to] && options[:to].to_proc.call(self)
incrementor.call(self).unfold(options, &incrementor).unshift(transformed)
end
end

One hitch: your fold and unfold are hundreds of times slower than Ruby’s built-in-C methods and classes. It reminds me of Newton programming. Apple gave us a really cool language—NewtonScript—for writing applications. Except, the built-in applications were written in C, and the C compiler was only for Apple engineers.

The good news about Ruby is that you can write your own classes in C if you want to. But that is a significant barrier to entry for many programmers, shrinking the available pool of programmers who will enhance the language.

Having core libraries in C is a great choice for implementing a language that is to be used for other things like building web applications. But it is not a great choice for a language that is to be used to build other languages. And “building other languages” is exactly what Bottom-Up Programming or Meta-Linguistic Abstractions are all about. In other words, writing the core libraries in C is not a great choice for a language where programmers write their own abstractions.

Now for many things, the speed penalty of writing your own abstractions in Ruby is negligible. But not everything. So there is always going to be this class of things—and I think collection manipulation is one of those things—where you need to be able to write stuff that is as good as what comes out-of-the-box.

If new stuff is an order of magnitude slower, you might be able to use it for non-critical things, but your chances of persuading anyone else to use it are very low. Which means that the language as a whole progresses slowly because real progress can only happen in areas where performance doesn’t matter. Like database-bound web applications.

Having an implementation where the built-in stuff is on the same footing as your stuff opens up the doors for actual progress. It forces the language itself to be Good Enough, and it makes it possible for every Ruby programmer to improve the language.

what we can learn from java, whoops smalltalk

Java has this incredibly powerful and popular IDE, Eclipse. It is so powerful that many people feel it is impossible to write production Java code without it. Why is it so powerful?

One of the major reasons it is both powerful and popular is the availability of plug ins. It seems to support the language, UML diagrams, source code control, and everything up to (I’m pretty sure) time tracking for client billing. Naturally, the plug ins are written in Java, just like the almost all of the built-in functionality.

Clara Creative can write her own plug in and it won’t be a second-class citizen. And since it is a tool for Java programmers, Clara Creative already knows how to write plug ins, she doesn’t need to drop into another language.

Wow, that is neat. And it does help explain why Eclipse has so many plug ins, and why they are popular: there is no low-level language barrier, and they all on an equal footing with each other.

This shouldn’t surprise you. Eclipse evolved from IBM VisualAge Micro, which was written by Smalltalk programmers in Smalltalk. And of course, Smalltalk is a language where almost everything is written in Smalltalk itself. Smalltalk programmers expect to be able to extend the language and environment without penalty.

In the end, the choice of whether to implement core features in C or Ruby will always be difficult. The temptation to optimize for speed will always be strong, especially when the language is fighting for mind share. But extensibility and variety is also a win, and Ruby fights with its libraries and features as much as with its performance.

Perhaps we won’t all be using fold and unfold instead of collect, select, and detect. But if we aren’t, it ought to be because we prefer the originals, not because the replacements are crippled in comparison, or because the kind of person who likes inventing new tools prefers to write them in Ruby instead of in C.


I’m looking forward to hearing more from the Rubinius team. I really think they hold the key to the future. Thanks, Ezra, for your comment.

Labels:

 

Saturday, December 15, 2007
  The Grinch Who Stole the Ruby Jobs
A reader was kind enough to point out that making statements about jobs being advertised could be misconstrued as a statement about the policies of the companies doing the hiring.

I do not speak for any hiring company, and I’m sure that hiring companies do not particularly want me articulating their policies for them.

Therefore, my policy hereforward is that when I wish to describe a job, I will provide links to open positions or name companies hiring. If I quote something, it is partially or fully information provided by the company and it will be indicated as such in quote marks.

I know, what a bother. Sheesh. But there you have it, that seems to be the way the world works. Roll with it and let’s move on.



mdlogix, a rapidly growing medical research software company is seeking Ruby on Rails developers for our downtown Toronto office.

We're looking for passionate developers who love Rails and want to make the world a better place by supporting clinical research!

For more information, visit http://www.mdlogix.com/aboutUs/jobSoftwareEngineerToronto.html or speak to us at the next Rails pub night!


Hewitt Associates is looking for a “P/A with Ruby on Rails experience,” possibly in Toronto (No link: they use an HR application for job listings that supports nice searches but mightily resists bookmarking).

GigPark is a Toronto-based web startup that's growing fast
Built with Ruby on Rails and hosted on EC2
We develop iteratively and release weekly
We care about quality - our code is well-designed and tested
We work hard, but smart. We believe less code is better and simple UI design is key We're not religious about technology. We believe in using the right tool for the job

http://www.gigpark.com/page/jobs


Mobile Commons is looking for “junior and senior hackers”.

Labels:

 

Monday, December 10, 2007
  Utility Belt
Utility Belt is a grab-bag of tricks, tools, techniques, trifles, and toys for IRB, including convenience methods, language patches, and useful extensions. It also includes a couple command-line widgets. Its primary inspirations were an awesome gem called Wirble and a blog post by Amy Hoy called “Secrets Of The Rails Console Ninjas”.

INSTALL

sudo gem install utility_belt

FEATURES




Who can resist publicizing a grab-bag gem that just happens to contain some code you snarfed from someone else? Not me! It’s like one of those vanity books of poetry: your poem is included provided you buy twenty copies for your friends and family…

Labels:

 

Tuesday, November 27, 2007
  Fun with Symbol#to_proc
If you want a golden rule that will fit everything, this is it: Have nothing in your houses that you do not know to be useful or believe to be beautiful.
William Morris

Most people love Symbol#to_proc sugar in Ruby:

%w[dsf fgdg fg].map(&:capitalize)
→ ["Dsf", "Fgdg", "Fg"]
And Googling around, all of the examples I could find were for unary methods like this. But what about binary methods like :merge? If Symbol#to_proc supported binary methods, you could do things like:

(1..5).inject &:+
→ 15
Or:

[{ :foo => 1 }, { :bar => 2 }, { :blitz => 3 }].inject &:merge
→ {:foo=>1, :bar=>2, :blitz=>3}
That would be brilliant.1 And the implementation can’t be that hard in a language like Ruby, can it? Let’s have a look at the source code (I don’t have the Ruby 1.9 source, but I do have Symbol.rb from the Rails Core Extensions):

class Symbol
# Turns the symbol into a simple proc, which is especially useful for enumerations. Examples:
#
# # The same as people.collect { |p| p.name }
# people.collect(&:name)
#
# # The same as people.select { |p| p.manager? }.collect { |p| p.salary }
# people.select(&:manager?).collect(&:salary)
def to_proc
Proc.new { |*args| args.shift.__send__(self, *args) }
end
end
Would you look at that, Symbol#to_proc already handles methods with arguments. And indeed, the examples above work right out of the box in Rails. I expect them to work in Ruby 1.9 as well.

You still can’t do some of String#to_proc’s other point-free tricks like collection.map(&'*2'), but quite frankly this is a really nice feature that deserves more publicity. So… don’t forget that Symbol#to_proc handles your unary and binary methods.

Cheers!

  1. I know what you’re thinking, but try to remember that the other meaning of the word “brilliant” is shiny and sparkling.


p.s. You’re wondering where I get the time to think up stuff like this? Actually, I’m lucky enough use stuff like this for a living. I needed to fold a collection of hashes together, and my first thought was that I’d like to write .inject(&:merge) to do it. This is the principle of least astonishment in action…

Labels:

 

Saturday, November 24, 2007
  How I've been spending my time off work lately
The weather being fine, and in the flagrant pursuit of happiness, I’ve managed to get a few holes in this week (using Ruby of course). Here’s my scorecard:

palindrome = (+(+[] | +[-:_] | ((-'[0]' >> -:ends) & (-'[1..-2]' >> -:p)
& (-'[-1]' >> -:ends) & -:_)) >> (-:'{}' >> -:p)).call
(palindrome =~ []).should_not be_nil
(palindrome =~ [:fish]).should_not be_nil
(palindrome =~ [:fish, :dog]).should be_nil
(palindrome =~ [:fish, :dog, :fish]).should_not be_nil
(palindrome =~ [:fish, :fish, :fish, :fish]).should_not be_nil
(palindrome =~ [:fish, :fish, :fish, :fish, :fish]).should_not be_nil
(palindrome =~ [:fish, :dog, :dog, :fish]).should_not be_nil
(palindrome =~ [:fish, :dog, :cat, :dog, :fish]).should_not be_nil
(palindrome =~ [:fish, :dog, :cat, :fish]).should be_nil

Perhaps Guido has a point, indentation matters:

palindrome = (
+( # we'll start by creating a pattern
+[] | # match the empty list, or:
+[-:_] | # match a list containing anything
# (-:_ matches any one entity), or:
( # alternative three:
(-'[0]' >> -:ends) & # extract the first element and
# bind it to 'ends'
(-'[1..-2]' >> -:p) & # and everything from the second to the
# second-last will be matched against a
# pattern :p
(-'[-1]' >> -:ends) & # and the last element must match whatever
# was bound to 'ends' above.
# it's equally effective to include
# -'_[0].eql?(_[1])', same thing.
-:_ # if all that matches, match the entire array
# so that the result is the entire array, not
# whatever part of the array we were just
# matching
) # therein ends the pattern
) >> (-:'{}' >> -:p) # pass the pattern to a closure-maker (-:'{}')
# and then bind it to 'p' so that the pattern
# is now recursive: this pattern matches an
# empty list, a list with one element, or
# a list with the first and last elements eql?
# to each other and whatever is in-between them
# matching the palindrome pattern bound to 'p'.
).call

Hmmm. Still doesn’t make any sense. I guess home-brew programs are like babies: they appear to be delightful and impart a sense of wonder in the mysteries of the universe to their creators, but are noisy and obstreperous to everyone else.

f = (
+(
(-'==0' >> -'0') |
(-'==1' >> -'1') |
(-'n>1' >> -'n * f[n-1]') ) >> (-:'{}' >> -:f)).call
f[0].should == 0
f[1].should == 1
f[2].should == 2
f[3].should == 6
f[4].should == 24
f[5].should == 120

Labels:

 

Friday, November 09, 2007
  Really useful anamorphisms in Ruby
Really simple anamorphisms in Ruby introduced a very simple unfold. Its chief characteristics were that it generated an Array from a value of some sort, and it did so by applying an incrementor block to its seed recursively until it generated nil. For example:

10.class.unfold(&:superclass)
=> [Fixnum, Integer, Numeric, Object]
A very simple modification allows us to separate the two blocks with a :while or :to pseudo-keyword, and to add a :map keyword for transforming the state into the desired result. Thus, this really simple unfold:

1.unfold(&'_+1 unless _==10').map(&'**2')
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Can also be expressed as:

1.unfold(:to => '==10', :map => '**2', &'_+1')
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
The latter form is helpful once unfolds become larger and more complex than these simple one-liners.

(There is another style of writing :unfold, using method chaining and lazy evaluation to eliminate lambda keywords, but we will save that for another time: it is a great examination of syntax but does not change :unfold’s fundamental behaviour.)

Let’s turn it up a notch

These trivial examples are not particularly compelling. Unfold is touted as the complement to :inject. So you would expect :unfold to be as useful as :inject. And :inject is very, very useful—you “reduce” lists of things to values all the time.

But how often do you need to turn a value into a list? How often do you need to turn ‘10’ into ‘[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]’? And if you do, what’s wrong with using (1..10).map(&'**2')?

Remember that :unfold can be applied to objects with a lot more information to them. The thing that had me stuck when I first saw :unfold was thinking of it as the opposite of :inject. Or at least, the opposite of how I used :inject. I tended to use :inject in a way that reduced information. For example:

[7, 6, 10, 3, 9, 4, 8, 5, 2, 1].inject(&'+')
This gives us the sum of the numbers from one to ten, as it happens. It also gives us a value that is considerably simpler than the list we used to generate the number. Information is lost when we use :inject to “reduce” a list to a very simple value. So my first reaction to :unfold was to think of ways to use :unfold on very simple values, like numerics.

But :unfold doesn’t have to work with simple values. It can work with arbitraily complex data structures. Consider:

def zip(*lists)
lists.unfold(
:while => '.first',
:map => '.map(&".first")',
&'_.reject(&".length < 2").map(&"[1..-1]")')
end
Zip is a function that takes two (or more, but let’s just say two for now) lists, and produces a list of pairs of items. So:

zip([:a, :b, :c], [1, 2, 3])
=> [[:a, 1], [:b, 2], [:c, 3]]
How does :unfold do it? First, of course, it makes a single list of lists. It then performs an unfold on this single data structure. The incrementor successively reduces each sublist by removing the first items. So the output of the successive incrementor operations is:

[
[[:a, :b, :c], [1, 2, 3]],
[[b, :c], [2, 3]]
[[:c], [3]]
]
The :map then extracts the first items from each sublist and presents them as a list:

[
[:a, 1],
[:b, 2],
[:c, 3]
]
Neat. But why do we care about zip? Well, if you’ll notice, we already have a bunch of really useful things we can do with lists, like :map, :select, :reject, :detect, and so on. What would you do if you had two lists and needed to do something with each pair in the list, like… A list of first names and surnames that need to be catenated together?

zip(first_names, surnames).map(&'"#{_[0]} #{_[1]}"')
Zip is useful when we have a bunch of parallel lists and there’s something we want to do with each tuple from the lists.

Generalized iteration

We recognize this “pattern,” it’s one of the most powerful in programming. Zip was one algorithm, a way of iterating over several lists simultaneously. The other algorithm was "#{_[0]} #{_[1]}", a recipe for what to do with the successive tuples of values.




The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to functional programming techniques just like these.

The powerful idea was to separate the mechanics of turning a data structure into a linear series of values—iterating—from what we want to actually do with each value. (In OO-style programming, we would define a method for lists of lists that returns an iterator over the tuples of values. Same thing, proving that how you do it is not as important as understanding why you do it.)

Unfold has other uses, but this one alone is worth the trouble to understand the pattern even if you aren’t rushing to implement this exact unfold method: Converting a single data structure to a list is one way to implement iteration: for any data structure, you can use unfold to define a linear iteration. You can then use :each or :map or :inject just as our parents before us would have used DO or FOR loops.

Consider this (inelegant, but I’m writing this rather late at night) unfold:

[[1, 2, 3], [4, 5, 6, [7, 8]], 9, 10].unfold(
:while => '.first',
:map => lambda { |first|
first = first.first while first.kind_of?(Array)
first
}
) { |state|
state = state.first + state[1..-1] while state.first.kind_of?(Array)
state[1..-1]
}
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
This is the same idea, only we convert a tree into a list representing a depth-first search of a simple tree. You may recognize it as Array’s :flatten method. Once again, it’s really a way of iterating over the elements of a tree. So one way to think of this :unfold is that it is an iterator over a tree’s leaves:

def flatten(arr)
arr.unfold(
:while => '.first',
:map => lambda { |first|
first = first.first while first.kind_of?(Array)
first
}
) { |state|
state = state.first + state[1..-1] while state.first.kind_of?(Array)
state[1..-1]
}
end
But I already know how to write Zip and Flatten methods, honest I do.

Zip and Flatten are relatively common, that’s why :flatten and :zip can both be found in Ruby’s standard Array class. And if there’s a data structure that needs regular unfolding, you ought to weigh the advantages and disadvantages of writing an :unfold for it or using more humdrum ways of writing an iterator.




The Haskell School of Expression is a terrific and relatively jargon-free introduction to the language that popularized fold, unfold, and all of the other functional programming idioms. As Eric Kidd says, it will make your head explode. Recommended!

However, what do you do when you only need to unfold something once? For example, perhaps you have code that obtains some data in JSON format, and having used a library to parse the JSON into a one-off list or hash, you want to iterate through it.

With unfold, you can write your one-time, specific iterator right in place. This is no different than using blocks and lambdas in Ruby for one-off functions that really don’t need the cermony and weight of being implemented as methods.

When you want to iterate through something, and you want to separate the mechanism for iterating through the data from what you do with the data, :unfold should be in your tool box.

Unfold and the bio-sciences. Not really.

I like to think of :unfold like unfolding a protein molecule. When you stare at a data structure, it’s dense, opaque. But you supply an unfold algorithm, and what looked like a messy ball of twine unravels into a long filament made up of simple elements. You can then operate on the simple elements, without getting what you want to do en-snarled in how you iterate over the data structure.

So there you have it. Unfold can be really useful if we see it as a standardized way to write iterators for data structures.


Update: The under-appreciated unfold.

Labels: , ,

 

Wednesday, November 07, 2007
  Really simple anamorphisms in Ruby
Anamorphisms are functions that map from some object to a more complex structure containing the type of the object. They are the dual—a fancy word for complement—of Catamorphisms, functions that map from some complex structure down to a simpler object.

In simpler terms, a Catamorphism is a function like inject: In Ruby, Inject takes a collection and produces something simpler. It is also called fold or reduce in other languages. Inject can do something like produce the sum of a list:

(1..5).inject &'+' => 15

Unfold does the reverse: it takes a single value and turns it into a collection. Now, a proper unfold can be configured with a seed value, a transformation, a stopping predicate, maybe a distinction between states and output values, and even the type of structure you want to create. A proper unfold would even work with lazy lists if it didn’t have a stopping condition.

But sometimes you want something really simple. The state and the output value could be the same thing, eliminating a transformation from state to output. The stopping predicate could simple, it could stop when it reaches nil. And it could always return Arrays. So you could use such a simple unfold whenever you have a seed value and some sort of function (expressed as a block) that returns nil when it has no more values.

For example:

10.unfold { |n| n-1 unless n == 1 }.inspect => [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
10.class.unfold(&:superclass).inspect => [Fixnum, Integer, Numeric, Object]

Hey, what happens if you combine really simple anamorphisms with really simple catamorphisms?

5.unfold(&'_-1 unless _==1').inject(&'*') => 120

Here is the code unfold.rb. (If overloading core classes with too much responsibility is not to your taste, it is trivial to express unfold as a function taking a seed and a block as arguments).

Now that I’ve whet your appetite, here’re Really useful anamorphisms in Ruby. And for another implementation, everything you ever wanted to know about implementing a true unfold in Ruby.

Enjoy!

(Code courtesy of Mobile Commons. Thanks!)


Here’s an interesting email from Hugh Sasse:

I was reading Code Complete 2 last night and Steve McConnell says that he wouldn’t hire a programmer who wrote a recursive factorial function. [I’m not sure that would be such a crime in languages like Lua with proper tail recursion, but still. (I’m probably wrong, now I’ve written that! “Open mouth, insert foot, echo internationally” as they used to say on Fidonet :-))] I thought “that seems a bit harsh”, especially since in real life they’d get one from a library most of the time, anyway.

So, as I have often run out of stack on Ruby, I’ve tried to rewrite unfold as an iterative function. It seems to be working. The script as modified blows up at 5000 on my system with the recursive version, but the iterative version succeeds at 5000.

Thanks for this blog entry. There’s something REALLY nice about this idea, which I can’t put my finger on. Might be something to do with loops terminating when they should, and The Pragmatic Programmers’ article about “cook until done”

http://www.pragprog.com/articles/cook-until-done

Unfold feels like a “do until finished” loop.

class Object
# As above, but iterative, rather than recursive.
def unfold2 &block
result = [self]
x = block.call(self)
while not x.nil?
result.push x
x = block.call(x)
end
return result
end
end

Labels: , ,

 

Saturday, October 27, 2007
  String#to_proc
Breaking news! The irb enhancement gem Utility Belt includes String#to_proc

String#to_proc is an addition to Ruby’s core String class to enable point-free hylomorphisms

I’ll start again. String#to_proc adds a method to Ruby’s core String class to make lots of mapping and reducing operations more compact and easier to read by removing boilerplate and focusing on what is to be done. In many cases, the existing black syntax is just fine. But in a few cases, String#to_proc can make an expression even simpler.

String#to_proc is a port of the String Lambdas from Oliver Steele’s Functional Javascript library. I have modified the syntax to reflect how String#to_proc works in Ruby.

We’ll start with the examples from String Lambdas so you can see what is actually going on. Then we’ll look at how to use the & coercion to make working with arrays really simple.

to_proc creates a function from a string that contains a single expression. This function can then be applied to an argument list, either immediately:

'x+1'.to_proc[2];
→ 3
'x+2*y'.to_proc[2, 3];
→ 8
or (more usefully) later:

square = 'x*x'.to_proc;
square(3);
→ 9
square(4);
→ 16
Explicit parameters

If the string contains a ->, this separates the parameters from the body.

'x y -> x+2*y'.to_proc[2, 3];
→ 8
'y x -> x+2*y'.to_proc[2, 3];
→ 7
Otherwise, if the string contains a _, it’s a unary function and _ is name of the parameter:

'_+1'.to_proc[2];
→ 3
'_*_'.to_proc[3];
→ 9
Implicit parameters

If the string doesn’t specify explicit parameters, they are implicit.

If the string starts with an operator or relation besides -, or ends with an operator or relation, then its implicit arguments are placed at the beginning and/or end:

'*2'.to_proc[2];
→ 4
'/2'.to_proc[4];
→ 2
'2/'.to_proc[4];
→ 0.5
'/'.to_proc[2, 4];
→ 0.5
’.’ counts as a right operator:

'.abs'.to_proc[-1];
→ 1
Otherwise, the variables in the string, in order of occurrence, are its parameters.

'x+1'.to_proc[2];
→ 3
'x*x'.to_proc[3];
→ 9
'x + 2*y'.to_proc[1, 2];
→ 5
'y + 2*x'.to_proc[1, 2];
→ 5
Chaining

Chain -> to create curried functions.

'x y -> x+y'.to_proc[2, 3];
→ 5
'x -> y -> x+y'.to_proc[2][3];
→ 5
plus_two = 'x -> y -> x+y'.to_proc[2];
plus_two[3]
→ 5
Using String#to_proc in Idiomatic Ruby

Ruby on Rails popularized Symbol#to_proc, so much so that it will be part of Ruby 1.9.

If you like:

%w[dsf fgdg fg].map(&:capitalize)
→ ["Dsf", "Fgdg", "Fg"]
then %w[dsf fgdg fg].map(&'.capitalize') isn’t much of an improvement.

But what about doubling every value in a list:

(1..5).map &'*2'
→ [2, 4, 6, 8, 10]
Or folding a list:

(1..5).inject &'+'
→ 15
Or having fun with factorial:

factorial = "(1.._).inject &'*'".to_proc
factorial[5]
→ 120
String#to_proc, in combination with & coercing a value into a proc, lets you write compact maps, injections, selections, detections (and many others!) when you only need a simple expression.

Caveats: String#to_proc uses eval. Cue the chorus of people—pounding away on quad 3Ghz systems—complaining about the performance. You’re an adult. Decide for yourself whether this is an issue. After mankying things about to deduce the parameters, String#to_proc evaluates its expression in a different binding than where you wrote the String. This matters if you include free variables. My thinking is that it ceases to be a simple, easy-to-understand hack and becomes a cyrptic nightmare once you get too fancy.

You know that Voight-Kampff test of yours… did you ever take that test yourself?
—Rachael, Blade Runner


I have been using Functional Javascript for quite some time now, and I use the String Lambdas a lot. However, Ruby and Javascript are very different languages. Once you get out of the browser’s DOM, Javascript is a lot cleaner and more elegant than Ruby. For example, you don’t need to memorize the difference between a block, a lambda, and a proc. Javascript just has functions.

However, Javascript is more verbose: Whereas in Ruby you can write [1, 2, 3].map { |x| x*2 }, if Javascript had a map method for arrays, you would still have to write [1, 2, 3].map(function (x) { return x*2; }). So it’s a big win to make Javascript less verbose: code is easier to read at a glance when you don’t have to wade through jillions of function keywords.

Nevertheless, I still find myself itching for the String Lambdas when I’m writing Ruby code. It may be a matter of questionable taste, but for certain extremely simple expressions, I vastly prefer the point-free style. (-3..3).map &:abs is shorter than (-3..3).map { |x| x.abs }.

It is also cleaner to me. abs is a message, especially in a language like Ruby that supports the sending arbitrary messages named by symbols. Writing (-3..3).map &:abs looks very much like sending the abs message to everything in the list. I don’t need an x in there to tell me that.

Thus, I obviously like (-3..3).map &'.abs'. But I like (1..5).map &'*2' for the same reason. It isn’t just shorter, it hides a temporary variable that really doesn’t mean Jack to me when I’m reading the code. And quite honestly, (1..10).inject { |acc, mem| acc + mem } raises more questions than it answers about what inject does and how it does it. (1..10).inject &'+' gets right down to business for me. I’d prefer that it be called “fold,” but the raw, naked + seems to describe what I want done instead of how I want the computer to do it.

Symbol#to_proc also supports named parameters, either through implication (&'x+y') or with the arrow ('x y -> x*y'). I haven’t thought of a case where that would be a win over using a Ruby block: { |x, y| x*y }.

I’m divided about the underscore notation. It seems like a good compromise for expressions where there is a single parameter and it doesn’t fall on the left or the right side of an expression. Standardizing on an unusual variable name is, I think, a win. Underscore often means a “hole” in an expression or a computation, so it feels like a good fit. I would honestly much rather see something like: &'(1/_)+1' than &'(1/x)+1'. The underscore jumps out in an obvious way, and it wouldn’t be magically clearer to write { |x| (1/x)+1 }.

That being said, I haven’t actually written an underscore expression yet in actual code, so far I’m getting by using the point-free expressions to simplify things and using Ruby blocks for everything else.

RSpec

describe "String to Proc" do

before(:all) do
@one2five = 1..5
end

it "should handle simple arrow notation" do
@one2five.map(&'x -> x + 1').should eql(@one2five.map { |x| x + 1 })
@one2five.map(&'x -> x*x').should eql(@one2five.map { |x| x*x })
@one2five.inject(&'x y -> x*y').should eql(@one2five.inject { |x,y| x*y })
'x y -> x**y'.to_proc()[2,3].should eql(lambda { |x,y| x**y }[2,3])
'y x -> x**y'.to_proc()[2,3].should eql(lambda { |y,x| x**y }[2,3])
end

it "should handle chained arrows" do
'x -> y -> x**y'.to_proc()[2][3].should eql(lambda { |x| lambda { |y| x**y } }[2][3])
'x -> y z -> y**(z-x)'.to_proc()[1][2,3].should eql(lambda { |x| lambda { |y,z| y**(z-x) } }[1][2,3])
end

it "should handle the default parameter" do
@one2five.map(&'2**_/2').should eql(@one2five.map { |x| 2**x/2 })
@one2five.select(&'_%2==0').should eql(@one2five.select { |x| x%2==0 })
end

it "should handle point-free notation" do
@one2five.inject(&'*').should eql(@one2five.inject { |mem, var| mem * var })
@one2five.select(&'>2').should eql(@one2five.select { |x| x>2 })
@one2five.select(&'2<').should eql(@one2five.select { |x| 2<x })
@one2five.map(&'2*').should eql(@one2five.map { |x| 2*x })
(-3..3).map(&'.abs').should eql((-3..3).map { |x| x.abs })
end

it "should handle implied parameters as best it can" do
@one2five.inject(&'x*y').should eql(@one2five.inject(&'*'))
'x**y'.to_proc()[2,3].should eql(8)
'y**x'.to_proc()[2,3].should eql(8)
end

end
Go ahead, download the source code for yourself.

Update: Reg smacks himself in the head!

I had a look at the source code for Symbol#to_proc:

class Symbol
# Turns the symbol into a simple proc, which is especially useful for enumerations. Examples:
#
# # The same as people.collect { |p| p.name }
# people.collect(&:name)
#
# # The same as people.select { |p| p.manager? }.collect { |p| p.salary }
# people.select(&:manager?).collect(&:salary)
def to_proc
Proc.new { |*args| args.shift.__send__(self, *args) }
end
end
Look at that: Although the examples are all of unary messages like .name, the lambdas created handle methods with arguments. And since almost everything in Ruby is a method, including operators like +… You can use Symbol#to_proc to do some of the point-free stuff I like:

[1, 2, 3, 4, 5].inject(&:+)
→ 15
[{ :foo => 1 }, { :bar => 2 }, { :blitz => 3 }].inject &:merge
→ {:foo=>1, :bar=>2, :blitz=>3}

Labels: , ,

 

Friday, October 26, 2007
  map(&"(1.._).inject(&'*')")
Some working Ruby code:

(1..3).map(&'*2') => [2, 4, 6]
(1..3).map(&'[-1, _, 0]') => [[-1, 1, 0], [-1, 2, 0], [-1, 3, 0]]
(1..3).map(&'x -> y -> x * y').map(&'[2]') => [2, 4, 6]
(1..5).select(&'>2') => [3, 4, 5]
(1..3).map(&'x -> y -> x * y').map(&'.call(2)') => [2, 4, 6]
[5].map(&"(1.._).inject(&'*')") => [120]


See String#_to_proc.

Labels: ,

 

Thursday, September 27, 2007
  Java is the right answer to the wrong question, Ruby is the wrong answer to the right question
Large code bases are the problem, not the language they’re written in. Find a way to break/decompose big code bases into little ones.
Bill de hÓra, calming a heated debate


Can we please stop calling each other silly names over the languages we choose and move on to calling each other silly names over our choices for writing literate programs, eliminating redundancy, decoupling modules from each other, and so forth?

The very next time you find yourself being drawn into a debate about Java vs. Ruby, try saying something obscure, such as, Java is the right answer to the wrong question, Ruby is the wrong answer to the right question and see if you can redirect the energy towards exploring new ground instead of stagnating in the same old tired place.

Do not follow in the foot steps of the Sages. Seek what they sought.

When we argue Java vs. Ruby, or testing vs. type checking, we are arguing about foot steps. Take the debate to the next level. What is it we are trying to accomplish with those features? What are we seeking?

My favourite example of this is the subject of automatic refactoring. My current text editor doesn’t perform automatic refactoring (ducks). Are you about to blast a flaming comment or perhaps an entire blog post explaining what a dunderhead I am for not using an IDE that can move a method with a single dialog box?

Stop for a moment. Why do we move methods? What are we trying to accomplish? What is it about our code after we have refactored that makes the code better? Now you have a really juicy comment to make, a deep and insightful point of view to bring to a discussion. Let’s talk about that quality of good code first, and then we can talk about how to get there later, if ever.

So the next time you read that “Strong, static typing is superior because it enables the IDE to provide automatic refactoring support,” take a deep breath. Don’t nod and agree. Don’t shake your head and disagree. Ask a polite question: What are we trying to accomplish with automatic refactoring? And when told that “Unit tests are superior to compiler type checking for catching bugs,” don’t settle for “catching bugs” as axiomatic. Ask what quality really means for code.

Nothing is going to save us from talking ourselves to death. But at least we can debate at a higher level of abstraction.

Labels:

 

Sunday, September 16, 2007
  Breaking News: Write a Ruby function. Win a $100 Bounty!
I just spotted this on the Toronto Ruby on Rails mailing list:



/me thinks it’s too quiet here. Time for a Sunday afternoon ruby
challenge!

$100 for the best implementation, good until 6pm tonight (September 16, 2007).

THE PROBLEM

Marcel’s Amazon S3 gem presents files stored in a bucket as an object
which exposes a key attribute that conceptually map to the idea of a
file in a filesystem. You can fake a folder hierarchy by uploading 0
byte files with a key ending in a / (eg. folder1/ )



The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to dynamic programming techniques just like these.

Your mission is to convert the objects array of an S3 bucket to a
Hash, so that we can present a tree view to the user. It should
support an arbitrary depth, and make good use of Ruby best practices.

THE TEST DATA

Let’s assume you have a bunch of items in a bucket, and if you iterate
through the item keys, you’ll get this list:

folder1/
folder2/
folder2/folder4/
folder2/folder5/
folder2/folder5/temp6.txt
folder2/temp3.txt
folder2/temp4.txt
folder2/temp5.txt
folder3/
folder3/temp7.txt
temp1.txt
temp2.txt

This covers all cases; a hierarchy of files in folders, including
files in the root and empty folders.

THE SOLUTION

results = {
"temp1.txt" => #<AWS::S3::S3Object:0x31ab7bc>,
"temp2.txt" => #<AWS::S3::S3Object:0x31ab7bc>,
"folder1" => {},
"folder2" => {
"temp3.txt" => #<AWS::S3::S3Object:0x31ab7bc>,
"temp4.txt" => #<AWS::S3::S3Object:0x31ab7bc>,
"temp5.txt" => #<AWS::S3::S3Object:0x31ab7bc>,
"folder4" => {},
"folder5" => {
"temp6.txt" => #<AWS::S3::S3Object:0x31ab7bc>
},
},
"folder3" => {
"temp7.txt" => #<AWS::S3::S3Object:0x31ab7bc>
}
}

In other words, for each item in the hash, the name is the key without
any preceding path, and the value is the AWS::S3::S3Object that the
key points to. (I just pasted the same object in the example.)

And… go!

Pete Forde, unspace.ca



Do you want to win the $100? Pete told me that your solution is eligible to win provided you post it to the TorROR mailing list by 6PM, September 16th, 2007 (EST). For more details, see the TorROR Google Group.

Labels:

 

Friday, April 13, 2007
  From Abstraction to Zipf
Alpha…

Damien Katz raised an interesting question the other day: he wondered whether elements of computer programs followed a Zipf distribution. In other words, he wondered whether the most frequently used item occurred a lot more than the next most frequent, which occurs a lot more than the third most frequent, and so forth.

That’s an interesting question. Is it true? And if it is, what does that tell us about composing programs?

Looking at computer programs, some things seem to follow Zipf’s law and others don’t. In C++ programs, loop indices called i probably are an order of magnitude more common than those called j. Unless you use Apps Hungarian, of course. But are for loops that much more common than if statements? That might vary from place to place within a single program, but overall they might be roughly equal in frequency.

If we step up a level and look at idioms, my gut tells me that there is a strong Zipf distribution. Although each program will differ, some programs might make heavy use of lists, others of maps. Some might do a lot with closures, others with objects. These are fairly generic, of course.

If we have a look at the more domain-specific items in a program, we really see a Zipf distribution. Some things pop up again and again and again in a program, others are rare.

Why Abstract?1

The most important tool we have for composing programs is abstraction. When we create an abstraction for something, we get one and a half wins (a sesqui-win).



The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to implement advanced ideas like metaprogramming.

The full win is that we get to take a common element and place its responsibility in once place. For example, if your language gives you a map function or method, there is one place for the code that applies a function to each element of a collection and produces a collection of the results. Unlike a “pattern,” you don’t have to re-implement map every time you use it, you just use it. The centralization of responsibility in a single place is very powerful way to separate concerns.

The half win is that we don’t need to understand its implementation, we only need to understand its interface. I honestly don’t know whether Ruby’s built-in map method applies itself to a collection one at a time from the start to the end or in some other order. I suspect it’s from start to end for ordered collections, but I don’t know and I don’t have to know.

This is only half a win, because for anything too complicated to explain on a PowerPoint slide where you are promising the newest silver bullet, abstractions have a nasty habit of leaking. Leaky abstractions force you to understand the implementation to use them.

There are some drawbacks to abstractions. As noted, sometimes the abstraction leaks. In that case, you often have to look up what an abstraction does. If there was no abstraction and the implementation was right there in the middle of your code, you wouldn’t have to look it up. So when you see:

class BlogPost << ActiveRecord::Base
has_many :comments
# ...
end

You obviously to need to know that has_many creates methods like comments, comment_ids= and comments<< automagically for you. In fact, that is kind of the point of has_many. But if you are doing anything non-trivial, you also need to know whether has_many actually defined comments, comment_ids= and comments<< or whether it has merely modified BlogPost to handle those messages. In that sense, the abstraction leaks.

You get much the same thing if you build a Java class hierarchy that is umpteen levels deep festooned with abstract base classes, factories, and dependency injection. In one sense you get a really tight abstraction. In another sense, you have a big leak: you need to know its internals to get any real work done.

Another drawback to abstractions fall out of their strength. If you haven’t seen a particular abstraction before, you don’t know what it does and how it works. Contrary to popular hype, self-documenting names are not enough. If you are a Rails user, raise your hand if you can tell me what all of the various options for has_many are and what they do. You have to learn each abstraction’s interface, just as you had to learn what a for loop does for you.

The win (not having to know how it works) is also a loss (not knowing what it does).

An abstraction is an abstraction is an abstraction

I think most people agree that named subroutines (or functions, or procedures) are an excellent abstraction. First-class functions and anonymous functions seem to be an acquired taste (just one taste, and your programs will quickly acquire them). Objects—in the strong sense of polymorphic programs—are well-regarded abstractions.

Macros, DSLs, and other means of writing your own mini-language provoke the wildest claims of productivity and equally baseless fears of unreadable, fragile, “magic” code built out of gossamer dreams.


What these all have in common is that these functional abstractions live happily within the syntax provided by the host programming language. They abstract function without abstracting syntax. For the most part, non-syntactic abstraction is uncontroversial.

There seems to be debate over syntactic abstraction. Macros, DSLs, and other means of writing your own mini-language provoke the wildest claims of productivity and equally baseless fears of unreadable, fragile, “magic” code built out of gossamer dreams. The basic “problem” with syntactic abstractions like has_many in Rails or let in Scheme is that syntactic abstractions force you to learn their interface.

They are right in your face, especially if they are very brief. Consider:

BlogPost.find_or_create_by_name('From Abstraction to Zipf').comments << Comment.new('bleh!', 'anonymous')

Don’t bother searching for the find_or_create_by_name method or the comments<< method in the code base.

All the same, syntactic abstractions are just like functional abstractions. You get some wins, and you pay some costs in understanding and potential “leakiness.”

Zipf to the rescue

So should we zealously abstract everything we can? Or should we conservatively avoid “magic” like syntactic abstractions?

Every abstraction has a fixed learning cost. That cost is amortized over all of the instances we’ll encounter. So you need to learn how has_many works just once, and then you benefit every time you read it or write it. This is why abstractions built into a programming language are big wins: you amortize their cost over every program written in the language. Isn’t that why Java and C# look so much like C++ which looks so much like C? They were offering abstractions that have been paid up for years.

Zipf’s law tells us that each application only needs a very few new kinds of abstractions to get most of the benefits. There are only a few low-hanging fruits, but each is plump and juicy.


An abstraction built into a framework like Rinda or MapReduce is amortized over fewer programs. A domain-specific abstraction for a single organization is amortized over a very few programs, and an abstraction built into one program is only amortized over instances in the one code base. Unsurprisingly, opportunities for abstraction follow the Zipf distribution.



The Art of the Metaobject Protocol is one of the most important books ever written on the subject of building abstractions with objects. If the idea that there is more than one way to structure objects and classes seems surprising, then learning about the Meta-Object protocol will teach you what your language has kept secret.

Of course, that distribution is self-similar: the abstractions within a program follow the same distribution as the abstractions within an organization. So even if you aren’t inventing a new programming language, Zipf can help.

Quite simply, it pays to aggressively abstract the few things that are encountered the most frequently. In a CRUD web application, schema relationships like belongs_to are incredibly frequent. There’s a big win from creating a syntactic abstraction, even if the learning curve looks daunting to newcomers. Creating a domain-specific language for database schema changes is also a big win.

Note that we aren’t saying that has_one, belongs_to, and has_many appear more than 50% of the time. They may be quite infrequent. The point is that they are far more frequent than something else you could abstract, and the other thing will take just as much time to learn to read but will pay off far less often.

But should you create a DSL for list comprehensions in your CRUD application or just use the language’s built-in collections? I would say, you would need an application that uses an awful lot of lists before a syntactic abstraction for lists is a win.

It might be, but the nice thing is, it probably won’t be close: Zipf’s law tells us that each application only needs a very few new kinds of abstractions to get most of the benefits. There are only a few low-hanging fruits, but each is plump and juicy.

…Omega

I think it is a win to use aggressive abstraction techniques like metaprogramming, but only if you restrain yourself to abstracting the very few things that appear most frequently in your code base. The things that are used less frequently should be abstracted using less aggressive techniques.

The resulting program may not be magically 50% shorter than an unabstracted program, because the “long tail” of idioms and patterns are simply not worth abstracting away even though we have the tools for doing so.

You can probably tell whether a program has the proper level of abstraction just by checking its distribution.

If you find a big abstraction framework for one thing while frequently used idioms are expressed using patterns instead of abstractions, you are probably looking at an improperly abstracted program. You have to wade through verbiage to read everything, and the one time you don't need to abstract the code away, you have to chase definitions through indirection.

And in a well-written program, I would expect that the things that occur most frequently are expressed at the highest and most powerful levels of abstraction, while things that happen infrequently are expressed in a more direct and explicit way. Such a program would be easy to read throughout.



  1. Why Abstract? is a delightful and wonderfully brief book that explains the appeal of abstract art in a simple and direct way. It’s one of the treasures on my bookshelf and I urge you to find a copy through a used book seller.

  2. Suprised? There is no footnote two. But if there was, it would be to mention that Paul Graham’s incredible book On LISP: Advanced Techniques for Common LISP is freely available online. This is a must read for anyone interested in composing syntactic abstractions in any language.

Labels: ,

 

Friday, March 16, 2007
  An Approach to Composing Domain-Specific Languages in Ruby
Whoa! This looks like a long post with a lot of code snippets. Am I going to have to do a lot of hard thinking, or can I just relax and enjoy a good rambling essay?

This is a bit long, probably (like all my posts) 200% longer than necessary. If you just want to see a neat DSL that implements Haskell and Python’s List Comprehensions written in Ruby, just scroll to the bottom.

If I do bother to read it all, will I learn some neat hacks?

Yes, but you could learn them just as well by reading the source code directly.

So the benefit of reading the whole thing is...?

The List Comprehensions DSL is the what. The source code is the how. But the essay is the why.

Reading the whole thing will take you through some of the pitfalls of writing DSLs and explain why I chose my particular workarounds.

Furthermore, there are a lot of corners in Ruby where you can easily assume that things work one way, but really they don’t. If you actually try the snippets on your computer, you’ll have a much better chance of remembering where the pitfalls are. That’s why I tried to give a working example for every point, rather than just explaining things in words.

Of course, if you have no interest in writing your own Domain Specific Languages in Ruby just yet... this isn’t meant as a popular essay, rather it’s meant as an experience report for fellow practitioners. And honestly, there’s a world market for maybe five tools for writing DSLs in Ruby.

But since you’re here, the essay starts below!



An Approach to Composing Domain-Specific Languages in Ruby

Ruby is often touted as a good language for writing Domain-Specific Languages (“DSLs”). There are a few arguments in favour of writing a DSL as part of an application.

The first argument that comes to mind is that if the application’s domain experts have a specific natural language or jargon of their own, writing a DSL makes it easy for programmers and domain experts to collaborate. While it is rare to find substantial applications entirely written by non-programmers at this time in any language, it is quite feasible for non-programmers to write or validate portions of an application representing its “business rules” or domain logic, while programmers maintain its infrastructure.

    include StarbucksDSL
order = latte venti, half_caf, non_fat, no_foam, no_whip
print order.prepare

Building Domain Specific Languages in Ruby


Another argument in favour of a DSL is that even when non-programmers are not involved directly in coding an application, the programmers themselves often have a jargon of their own to describe entities, algorithms and data structures in the application. Having portions of the application written in a language closely resembling the programmer’s own jargon makes it easy for them to read each other’s work and understand its intent.

Successful examples of DSLs embedded within existing languages and frameworks include Ruby on Rails’ ActiveRecord, where statements such as:

    has_and_belongs_to_many :Bar
validates_presence_of :blitz
some_bars = Bar.find_by_tavern_license(license_number)

Are self-documenting to anyone familiar with relational models.

The final argument I’ll repeat here is that a DSL is a very effective way to separate the what from the how of an algorithm. Separation of concerns is a desirable property of good programs, and DSLs provide this separation very clearly. In the ActiveRecord examples above, the exact mechanisms of relating tables, validating records, and performing searches is “abstracted away” from the code where the programmer declares how she would like the results used.

Freedom is Slavery

DSLs can be hacked together quickly in Ruby (whether they can be made sufficiently robust for your production needs may require considerably more care). Hacking a DSL together with little effort is a benefit, especially when prototyping: sometimes the best way to design a DSL is to try to use it, so you can discover what you need to express.



The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to dynamic programming techniques just like these.

Some developers have raised the concern that extensive use of “magic” features leads to code that cannot be understood or maintained.1 My own feeling is that DSLs lead to code that is easier to understand, not more difficult to understand. This leaves an argument about maintenance. Some techniques for meta-programming, such as extending core classes like Array, have what you might call “non-local effects.”

For example, two different pieces of code might try to extend the same core class, interfering with each other. Each works in isolation and passes all of its unit tests. But when plugged into a larger application that uses them together, they break.


Lispers are among the best grads of the Sweep-It-Under-Someone-Else’s-Carpet School of Simulated Simplicity.

—Larry Wall


Another problem occurs with extending the Kernel class or creating “top level” methods to be used as verbs in a DSL. You end up with name space crowding: you must be very careful that you do not redefine en existing method.

To fix this problem, the code that implements the DSL needs to be contained so that it does not interfere with other code. We can still implement verbs as methods, but we must implement those methods in separate objects, classes, or modules.

Zen in the Art of Program Maintenance

An established technique for implementing methods in objects is to define the methods and then execute a block of code using instance_eval so that it has access to the object’s methods.


I’m trying to get the Zen of building DSLs using Ruby. After reading a dozen or so pieces referenced by my favourite search engine, I have a feeling I’m still not quite getting it.

Don Box


You know, code expresses an idea better than words express an idea… when the idea is about coding. Please try this example in irb. Don’t just skim the text and nod: there’s a powerful learning mechanism at work when you physically do things as you’re learning, even if it’s just copying, pasting, comparing the result in one window to the text in another, and so on:

def bjarne
'Barney'
end

dsl = Object.new
def dsl.phred
'Fred'
end

plus = ' plus '

print dsl.instance_eval {
phred + plus + bjarne
}
##### "Fred plus Barney"

What does this show? Well, we have created a way to use a method defined in our dsl object, a local variable plus, and a top-level method bjarne. We can imagine scaling this up to defining a rich DSL in our DSL object and being able to mix verbs from the DSL with instance variables and other methods as we please.

Touching back on the subject of containment, we have defined bjarne in Kernel. Now bjarne is essentially global. If we already defined bjarne somewhere else, we just clobbered it. And if we later run a piece of code that defines bjarne, we’ll clobber our own version. phred is different. It’s defined inside of an object, and it doesn’t conflict with any other phred we define elsewhere.

Great! So… Can we cite a few examples of this technique in action (such as Jamis’ post where he calls phred and bjarne examples of Sandboxing and Top-level methods) and end the post here?

No. The code above looks fine. But there is a hidden problem with this sandboxing technique:

MyDsl = Object.new

def MyDsl.phred
'Fred'
end

class ClientCode

def bjarne
'Barney'
end

def friends
plus = ' plus '
MyDsl.instance_eval { phred + plus + bjarne }
end

end

ClientCode.new.friends
##### -:15:in `friends': undefined local variable or method `bjarne' for # (NameError) from -:15:in `friends' from -:20

WTF?! This looks just like our top-level example, but we’ve placed our code inside of a ClientCode method. And bjarne is a method in ClientCode: this way we can continue to separate concerns, keeping phred inside our DSL and bjarne inside of the class where we are using the DSL. But it doesn’t work.

Why instance_eval breaks (in tedious detail)

As you know, everything in Ruby is either a variable or a method (how it figures out the difference is a major irritation). When you invoke a method, you are actually sending a message to a receiver.2 Sometimes you name the receiver (some_object.a_method), and there is no ambiguity.

But when you just name the method (like bjarne), Ruby tries to find the method for itself. It does so by looking to see whether it is an instance method, in which case it behaves like self.bjarne. If not, it looks to see whether bjarne is top-level, in which case it calls that method in the Kernel. See for yourself:

def foo
'top level foo'
end

def bar
'top level bar'
end

class Test
def bar
'instance method bar'
end
def test
p foo
p bar
end
end

Test.new.test
##### "top level foo" "instance method bar"

See? It looks for instance methods and then for top-level methods if it can’t find anything. (Again, we are hand-waving over the pesky problem with instance variables in the case where we don’t use ()). What’s the problem? Well, I actually mis-described what happens. Here it is again, with more precision:

It looks for methods defined in the object self, and then for top-level methods if it can’t find anything. Of course, self is the current object. Unless it isn’t: That’s what instance_eval does: it evaluates a block but it changes self to point to its receiver instead of the object where the code is executing. Everything else stays the same. One more example to show the mechanism:

def foo
'top level foo'
end

def bar
'top level bar'
end

class Test
def bar
'instance method bar'
end
def blitz
'current object blitz'
end
def test
p foo
p bar
o = Object.new
def o.blitz
'redefined self blitz'
end
p o.instance_eval { blitz }
p o.instance_eval { 'bar within o gives: ' + bar }
end
end

Test.new.test
##### "top level foo" "instance method bar" "redefined self blitz" "bar within o gives: top level bar"

Now we see: when we use instance_eval, we route around our current object and all of our methods are ignored within the block. Ruby really only has two levels of scope: whatever belongs to self and whatever belongs to Kernel.

This state of affairs is unsatisfactory: we would like to introduce a DSL in such a way that we retain access to all of our methods without kludges (like storing the current object in an instance variable).

Nesting Scopes



The Seasoned Schemer is devoted to the myriad uses of first class functions. This book is approachable and a delight to read, but the ideas are provocative and when you close the back cover you will be able to compose programs from functions in powerful new ways.

You can think of the current scope as being nested inside of the top-level scope. instance_eval doesn’t change the scope for things like local variables, it just points self elsewhere.

What we want is a new scope for our DSL nested inside of the current scope. So when we search for a method, we should check the DSL. If we don’t find it there, check the current object’s scope. If we don’t find it there, check the top-level.


Those who do not learn from the History of Lisp are doomed to repeat it.

Oops. John McCarthy called from 1960. He wants Lisp’s dynamic scoping back. Yes, our new feature is almost fifty years old. This is why either a through grounding in CS theory or a hobbyist’s interest in the history of programming are important for programming: much of what we want to do has already been done before, and sometimes in unexpected contexts. Who would have thought that a technique for helping programmers collaborate with Bond Traders has roots in Lisp 1.5?

Here’s an implementation of a nested scope construct that does exactly what we want. You declare a new class that extends DomainSpecificLanguage, and then you can use methods from your DSL, from your current object, and from the top-level (if you so choose). For example:

require 'dsl'

class MyDSL < DomainSpecificLanguage

def bjarne
'Barney'
end

end

class TheGreat

def phred
'Fredrick'
end

def test
plus = ' plus '
MyDSL.eval { p phred + plus + bjarne }
end

end

TheGreat.new.test
##### "Fredrick plus Barney"

This does exactly what we want with methods.

There's also a single extension to kernel, the method with. with replaces the eval method so you can also say:

with MyDSL do
p phred + plus + bjarne
end


The eval method creates a new instance of your DSL class, so you can track state within an evaluation. For example:

class Censor < DomainSpecificLanguage
attr_reader :ok_on_tv

def initialize (given_binding)
super(given_binding)
@ok_on_tv = true
end

def say something
something.split.each do |word|
@ok_on_tv = false if ['feces', 'urine', 'love', 'pudendum', 'fellator', 'oedipus', 'mammaries'].include?(word)
end
end

end

class GeorgeCarlin
def test
Censor.eval {
say "People much wiser than I have said, I'd rather have my son watch a film with two people making love than two people trying to kill one other."
say "And I of course agree. I wish I know who said it first, and I agree with that."
ok_on_tv
}
end
end

p GeorgeCarlin.new.test
##### "false"

let

The first obvious drawback of this approach is that the blocks we pa