Really useful anamorphisms in Ruby
Really simple anamorphisms in Ruby introduced a very simple
unfold. Its chief characteristics were that it generated an Array from a value of some sort, and it did so by applying an incrementor block to its seed recursively until it generated nil. For example:
10.class.unfold(&:superclass)
=> [Fixnum, Integer, Numeric, Object]
A very simple modification allows us to separate the two blocks with a :while or :to pseudo-keyword, and to add a :map keyword for transforming the state into the desired result. Thus, this really simple unfold:
1.unfold(&'_+1 unless _==10').map(&'**2')
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Can also be expressed as:
1.unfold(:to => '==10', :map => '**2', &'_+1')
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
The latter form is helpful once unfolds become larger and more complex than these simple one-liners.
(There is another style of writing :unfold, using method chaining and lazy evaluation to eliminate lambda keywords, but we will save that for another time: it is a great examination of syntax but does not change :unfold’s fundamental behaviour.)
Let’s turn it up a notchThese trivial examples are not particularly compelling. Unfold is touted as the complement to :inject. So you would expect :unfold to be as useful as :inject. And :inject is very, very useful—you “reduce” lists of things to values all the time.
But how often do you need to turn a value into a list? How often do you need to turn ‘10’ into ‘[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]’? And if you do, what’s wrong with using
(1..10).map(&'**2')
?
Remember that :unfold can be applied to objects with a lot more information to them. The thing that had me stuck when I first saw :unfold was thinking of it as the opposite of :inject. Or at least, the opposite of how I used :inject. I tended to use :inject in a way that reduced information. For example:
[7, 6, 10, 3, 9, 4, 8, 5, 2, 1].inject(&'+')
This gives us the sum of the numbers from one to ten, as it happens. It also gives us a value that is considerably simpler than the list we used to generate the number. Information is lost when we use :inject to “reduce” a list to a very simple value. So my first reaction to :unfold was to think of ways to use :unfold on very simple values, like numerics.
But :unfold doesn’t have to work with simple values. It can work with arbitraily complex data structures. Consider:
def zip(*lists)
lists.unfold(
:while => '.first',
:map => '.map(&".first")',
&'_.reject(&".length < 2").map(&"[1..-1]")')
end
Zip is a function that takes two (or more, but let’s just say two for now) lists, and produces a list of pairs of items. So:
zip([:a, :b, :c], [1, 2, 3])
=> [[:a, 1], [:b, 2], [:c, 3]]
How does :unfold do it? First, of course, it makes a single list of lists. It then performs an unfold on this single data structure. The incrementor successively reduces each sublist by removing the first items. So the output of the successive incrementor operations is:
[
[[:a, :b, :c], [1, 2, 3]],
[[b, :c], [2, 3]]
[[:c], [3]]
]
The :map then extracts the first items from each sublist and presents them as a list:
[
[:a, 1],
[:b, 2],
[:c, 3]
]
Neat. But why do we care about zip? Well, if you’ll notice, we already have a bunch of really useful things we can do with lists, like :map, :select, :reject, :detect, and so on. What would you do if you had two lists and needed to do something with each pair in the list, like… A list of first names and surnames that need to be catenated together?
zip(first_names, surnames).map(&'"#{_[0]} #{_[1]}"')
Zip is useful when we have a bunch of parallel lists and there’s something we want to do with each tuple from the lists.
Generalized iterationWe recognize this “pattern,” it’s one of the most powerful in programming. Zip was one algorithm, a way of iterating over several lists simultaneously. The other algorithm was
"#{_[0]} #{_[1]}"
, a recipe for what to do with the successive tuples of values.
The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to functional programming techniques just like these.
The powerful idea was to separate the mechanics of turning a data structure into a linear series of values—iterating—from what we want to actually do with each value. (In OO-style programming, we would define a method for lists of lists that returns an iterator over the tuples of values. Same thing, proving that how you do it is not as important as understanding
why you do it.)
Unfold has other uses, but this one alone is worth the trouble to understand the pattern even if you aren’t rushing to implement this exact unfold method: Converting a single data structure to a list is one way to implement iteration: for any data structure, you can use unfold to define a linear iteration. You can then use :each or :map or :inject just as our parents before us would have used DO or FOR loops.
Consider this (inelegant, but I’m writing this rather late at night) unfold:
[[1, 2, 3], [4, 5, 6, [7, 8]], 9, 10].unfold(
:while => '.first',
:map => lambda { |first|
first = first.first while first.kind_of?(Array)
first
}
) { |state|
state = state.first + state[1..-1] while state.first.kind_of?(Array)
state[1..-1]
}
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
This is the same idea, only we convert a tree into a list representing a depth-first search of a simple tree. You may recognize it as Array’s :flatten method. Once again, it’s really a way of iterating over the elements of a tree. So one way to think of this :unfold is that it is an iterator over a tree’s leaves:
def flatten(arr)
arr.unfold(
:while => '.first',
:map => lambda { |first|
first = first.first while first.kind_of?(Array)
first
}
) { |state|
state = state.first + state[1..-1] while state.first.kind_of?(Array)
state[1..-1]
}
end
But I already know how to write Zip and Flatten methods, honest I do.Zip and Flatten
are relatively common, that’s why :flatten and :zip can both be found in Ruby’s standard Array class. And if there’s a data structure that needs regular unfolding, you ought to weigh the advantages and disadvantages of writing an :unfold for it or using more humdrum ways of writing an iterator.
The Haskell School of Expression is a terrific and relatively jargon-free introduction to the language that popularized fold, unfold, and all of the other functional programming idioms. As Eric Kidd says, it will make your head explode. Recommended!
However, what do you do when you only need to unfold something once? For example, perhaps you have code that obtains some data in JSON format, and having used a library to parse the JSON into a one-off list or hash, you want to iterate through it.
With unfold, you can write your one-time, specific iterator right in place. This is no different than using blocks and lambdas in Ruby for one-off functions that really don’t need the cermony and weight of being implemented as methods.
When you want to iterate through something, and you want to separate the mechanism for iterating through the data from what you do with the data, :unfold should be in your tool box.
Unfold and the bio-sciences. Not really.I like to think of :unfold like unfolding a protein molecule. When you stare at a data structure, it’s dense, opaque. But you supply an unfold algorithm, and what looked like a messy ball of twine unravels into a long filament made up of simple elements. You can then operate on the simple elements, without getting what you want to do en-snarled in how you iterate over the data structure.
So there you have it. Unfold can be really useful if we see it as a standardized way to write iterators for data structures.
Update: The under-appreciated unfold.Labels: lispy, popular, ruby