Really useful anamorphisms in Ruby

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Friday, November 09, 2007

Really simple anamorphisms in Ruby introduced a very simple unfold. Its chief characteristics were that it generated an Array from a value of some sort, and it did so by applying an incrementor block to its seed recursively until it generated nil. For example:


10.class.unfold(&:superclass)
    => [Fixnum, Integer, Numeric, Object]

A very simple modification allows us to separate the two blocks with a :while or :to pseudo-keyword, and to add a :map keyword for transforming the state into the desired result. Thus, this really simple unfold:


1.unfold(&'_+1 unless _==10').map(&'**2')
    => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Can also be expressed as:


1.unfold(:to => '==10', :map => '**2', &'_+1')
    => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

The latter form is helpful once unfolds become larger and more complex than these simple one-liners.

(There is another style of writing :unfold, using method chaining and lazy evaluation to eliminate lambda keywords, but we will save that for another time: it is a great examination of syntax but does not change :unfold’s fundamental behaviour.)

Let’s turn it up a notch

These trivial examples are not particularly compelling. Unfold is touted as the complement to :inject. So you would expect :unfold to be as useful as :inject. And :inject is very, very useful—you “reduce” lists of things to values all the time.

But how often do you need to turn a value into a list? How often do you need to turn ‘10’ into ‘[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]’? And if you do, what’s wrong with using (1..10).map(&'**2')?

Remember that :unfold can be applied to objects with a lot more information to them. The thing that had me stuck when I first saw :unfold was thinking of it as the opposite of :inject. Or at least, the opposite of how I used :inject. I tended to use :inject in a way that reduced information. For example:


[7, 6, 10, 3, 9, 4, 8, 5, 2, 1].inject(&'+')

This gives us the sum of the numbers from one to ten, as it happens. It also gives us a value that is considerably simpler than the list we used to generate the number. Information is lost when we use :inject to “reduce” a list to a very simple value. So my first reaction to :unfold was to think of ways to use :unfold on very simple values, like numerics.

But :unfold doesn’t have to work with simple values. It can work with arbitraily complex data structures. Consider:


def zip(*lists)
  lists.unfold(
      :while => '.first', 
      :map => '.map(&".first")',
      &'_.reject(&".length < 2").map(&"[1..-1]")')
end

Zip is a function that takes two (or more, but let’s just say two for now) lists, and produces a list of pairs of items. So:


zip([:a, :b, :c], [1, 2, 3])
    => [[:a, 1], [:b, 2], [:c, 3]]

How does :unfold do it? First, of course, it makes a single list of lists. It then performs an unfold on this single data structure. The incrementor successively reduces each sublist by removing the first items. So the output of the successive incrementor operations is:


[
    [[:a, :b, :c], [1, 2, 3]],
    [[b, :c],      [2, 3]]
    [[:c],         [3]]
]

The :map then extracts the first items from each sublist and presents them as a list:


[
    [:a,           1], 
    [:b,           2], 
    [:c,           3]
]

Neat. But why do we care about zip? Well, if you’ll notice, we already have a bunch of really useful things we can do with lists, like :map, :select, :reject, :detect, and so on. What would you do if you had two lists and needed to do something with each pair in the list, like… A list of first names and surnames that need to be catenated together?


zip(first_names, surnames).map(&'"#{_[0]} #{_[1]}"')

Zip is useful when we have a bunch of parallel lists and there’s something we want to do with each tuple from the lists.

Generalized iteration

We recognize this “pattern,” it’s one of the most powerful in programming. Zip was one algorithm, a way of iterating over several lists simultaneously. The other algorithm was "#{_[0]} #{_[1]}", a recipe for what to do with the successive tuples of values.

The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to functional programming techniques just like these.

The powerful idea was to separate the mechanics of turning a data structure into a linear series of values—iterating—from what we want to actually do with each value. (In OO-style programming, we would define a method for lists of lists that returns an iterator over the tuples of values. Same thing, proving that how you do it is not as important as understanding why you do it.)

Unfold has other uses, but this one alone is worth the trouble to understand the pattern even if you aren’t rushing to implement this exact unfold method: Converting a single data structure to a list is one way to implement iteration: for any data structure, you can use unfold to define a linear iteration. You can then use :each or :map or :inject just as our parents before us would have used DO or FOR loops.

Consider this (inelegant, but I’m writing this rather late at night) unfold:


[[1, 2, 3], [4, 5, 6, [7, 8]], 9, 10].unfold(
  :while => '.first',
  :map => lambda { |first|
    first = first.first while first.kind_of?(Array)
    first
  }
) { |state|
  state = state.first + state[1..-1] while state.first.kind_of?(Array)
  state[1..-1]
}
    => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This is the same idea, only we convert a tree into a list representing a depth-first search of a simple tree. You may recognize it as Array’s :flatten method. Once again, it’s really a way of iterating over the elements of a tree. So one way to think of this :unfold is that it is an iterator over a tree’s leaves:


def flatten(arr)
  arr.unfold(
    :while => '.first',
    :map => lambda { |first|
      first = first.first while first.kind_of?(Array)
      first
    }
  ) { |state|
    state = state.first + state[1..-1] while state.first.kind_of?(Array)
    state[1..-1]
  }
end

But I already know how to write Zip and Flatten methods, honest I do.

Zip and Flatten are relatively common, that’s why :flatten and :zip can both be found in Ruby’s standard Array class. And if there’s a data structure that needs regular unfolding, you ought to weigh the advantages and disadvantages of writing an :unfold for it or using more humdrum ways of writing an iterator.

The Haskell School of Expression is a terrific and relatively jargon-free introduction to the language that popularized fold, unfold, and all of the other functional programming idioms. As Eric Kidd says, it will make your head explode. Recommended!

However, what do you do when you only need to unfold something once? For example, perhaps you have code that obtains some data in JSON format, and having used a library to parse the JSON into a one-off list or hash, you want to iterate through it.

With unfold, you can write your one-time, specific iterator right in place. This is no different than using blocks and lambdas in Ruby for one-off functions that really don’t need the cermony and weight of being implemented as methods.

When you want to iterate through something, and you want to separate the mechanism for iterating through the data from what you do with the data, :unfold should be in your tool box.

Unfold and the bio-sciences. Not really.

I like to think of :unfold like unfolding a protein molecule. When you stare at a data structure, it’s dense, opaque. But you supply an unfold algorithm, and what looked like a messy ball of twine unravels into a long filament made up of simple elements. You can then operate on the simple elements, without getting what you want to do en-snarled in how you iterate over the data structure.

So there you have it. Unfold can be really useful if we see it as a standardized way to write iterators for data structures.

Update: The under-appreciated unfold.

Labels: lispy, popular, ruby

¶ 10:46 PM

Comments on “Really useful anamorphisms in Ruby”:

hmm and is this needed?

# posted by

Anonymous : 2:35 PM

is this needed

What do you mean by needed? My understanding is that:

1. All Turing-equivalent languages are equally powerful.
2. Some of the world's largest and most profitable web applications are written in PHP.
3. And one of the others is written in .NET.
4. Some of the world's largest and most powerful business applications are still written in COBOL.
5. And most of the others are written in PL/SQL.
6. And Java.

So... You ask whether this is needed. And I ask, by whom? And for what?

# posted by

Reginald Braithwaite : 2:39 PM

reginald: regarding 'inject' and 'unfold' being opposites, in fact I think there is a way to formalize this in terms of category theory. In a certain sense, these operations are 'dual' to each other.

# posted by

Slava Pestov : 6:21 PM

At what point does the ugliness and inefficiency become the wrong answer and a different language the right one?

# posted by

Anonymous : 1:40 AM

At what point does the ugliness and inefficiency become the wrong answer and a different language the right one?

First, of course, you can make that argument in favour of moving from Ruby to something that supports unfolds and other operations elegantly and efficiently.

And you can also make that argument in favour of moving from something else that supports unfold--like Hakell--back to Ruby or even Java and eschewing all attempts to build higher-order functionality out of duct tape and baling wire (Greenspunning).

I honestly can't answer the question. People have been going at that since the days when people argued Lisp vs. C, Smalltalk vs. C, Java vs. C++, almost everything vs. Java...

# posted by

Reginald Braithwaite : 8:02 AM

Slava:

Yes, I knew they were foldr and foldl's dual. From "Really simple anamophisms in Ruby":

Anamorphisms are functions that map from some object to a more complex structure containing the type of the object. They are the dual—a fancy word for complement—of Catamorphisms, functions that map from some complex structure down to a simpler object.

But I leave the category theory to people who actually know what they are doing.

People mistakingly think this is a blog about programming. It's actually a job about passion, about being excited and deeply motivated by programming. So... if someone says, "Cool!!!" And is deeply engaged, engaged enough to go elsewhere and read more and play with unfold, maybe find a new way to think about how to do things...

That works for me.

# posted by

Reginald Braithwaite : 8:11 AM

Which version of Ruby are you using?
I have 1.86 but get a type error (expected proc).
Seems this might be the shove I needed to upgrade to 1.9

# posted by

Anonymous : 7:14 PM

With these examples, you need String#to_proc to handle all the &'==10' expressions.

If you find these "butt ugly," you can substitute lambdas. So instead of &'==10', use lambda { |n| n==10 }. Same thing.

# posted by

Reginald Braithwaite : 7:54 PM

All that tasty String#to_proc sugar is starting to make me salivate.

# posted by

Sofal : 10:07 AM

I find unfold is particularly useful for expressing numerical solutions for differential equations. In Common Lisp I have an implementation of unfold that lets me write things like the example given in the docstring at http://paste.lisp.org/display/52918

# posted by

jmbr : 7:38 AM

In relation to your molecule analogy (end of post), the term "denaturing" is what comes to mind.

# posted by

david : 7:13 PM

I think you only defined one anamorphism. You could also define them on arbitrary inductive datatypes. There was a toy language called Charity that was based on category theory and its basic control structures were catamorphisms and anamorphisms (general recursion was banned). You can then zip binary trees easily, for instance.

# posted by

Robert : 10:23 PM

Robert:

a. You’re right, I only defined one type of anamorphism, and...

b. Anamorphisms on inductive data types would be really useful.

# posted by

Reginald Braithwaite : 7:57 AM

Hi, awesome post. You mention that inject "loses information." While this is can be true in the 1 argument use, inject can also behave like "fold left" and take an accumulator of any data type, along with a function that takes the "built value" so far, and the current item in the original array. Here's an example using your first problem, getting the squares of the numbers 1 to 10 in an array, where in |xs,y| xs is the 'start' value and y is the original array value:

(1..10).inject([]){|xs,y| [y*y]+xs }.reverse #prepending, o(1) if ruby's array is a singly linked list?

(1..10).inject([]){|xs,y| xs+[y*y]} #appending, shorter, maybe slower

You can't deny the power of unfolding, but I've found that inject by itself is pretty powerful when used in this form. Thanks for the great writeup and code snippets.

# posted by

Alan : 5:54 PM

<< Home