No Detail Too Small

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Sunday, January 20, 2008

Expression-oriented programming (also known as functional or side-effect-free programming, although the three things are related, not synonymous) is a wonderful way to make calculations easier to understand and maintain. However, sometimes deeply nested function calls or mixing function calls with method invocations can make an expression difficult to understand at a glance. Here is a tip for refactoring your expressions so they are easier to read.

Expressions naturally form a tree, with values at the leaves and function calls or method invocations at each node. In this post, I’ll be talking about the simplest form of expression, a pipeline. A pipeline is an expression that does not branch: a value (or often collection of values) is transformed by two or more function calls or method invocations in succession. Here’s a slightly obfuscated example of a pipeline working with collections from one of our Rails applications:


widget.fizz.fizz_buzzes.select { |fizz_buzz| 
 fizz_buzz.widgets_column_name =~ /^special_.*/ 
}.map { |fizz_buzz|
 widget.attribute_present?(fizz_buzz.widgets_column_name) &&
   { fizz_buzz.label => widget.send(fizz_buzz.widgets_column_name) } ||
   {}
}.inject({}, &:merge)

While the details don’t make much sense out of context, the overall pattern ought to be familiar as an example of the MapReduce pattern (without the distributed processing, of course).

Pipelines read from right-to-left, left-to-right, or both. For example, this set of three nested function calls reads from right-to-left:


sum_numbers.call(square_numbers.call(odd_numbers.call(1..100)))

If I try to read it from left-to-right, it’s sounds like a caricature of speech: “The sum of the squares of the odd numbers from one to one hundred.” You can’t figure it out unless you build an abstract syntax tree in your head and then evaluate it with a stack machine. Having to emulate a computer to figure out what something means is not a good sign. it reads much easier from right-to-left: “Take the numbers from one to one hundred. Select the odd ones. Square them. And finally, take the sum.”

Popular languages like Ruby make it easy to write expressions that read from left-to-right directly: here’s an example from Ruby 1.9 (or with Symbol#to_proc):


(1..100).select(&:odd?).map { |n| n*n }.inject(&:+)
    => 166650

Object orientation’s emphasis on nouns at the expense of verbs has its issues. But when a computation really is a step-wise transformation of data, I find that chaining methods makes code a lot easier to understand than nesting functions. On the other hand, I prefer nesting functions when the expression has more of a tree form.

But whichever direction you prefer, I find it very difficult to read code that mixes directions in the same expression:


square_elements = lambda { ... } # content elided
square_elements.call((1..100).select(&:odd?)).inject(&:+)

You go left-to-right to select odd members, then back left to square them, then right to sum them. I find this much more confusing than either the nested functions or the chained method calls.

Object#into

A little while ago, I saw John Carter define factorial in Ruby as a method in the Integer class:


class Integer
  def factorial
    return 1 if self <= 1
    self * (self-1).factorial
  end
end

My first reaction was to think that adding factorial as a method was an idea from another planet:¹ why should integers know how to answer their own factorials? This seemed like a classic case of a function that should not be an object method. But nevertheless, having calculations be methods instead of functions lets you write a certain type of expression consistently from left-to-right (5.succ.factorial.succ.odd) instead of mixing directions (factorial.call(5.succ).succ.odd?).

All the same, there are good reasons why we don’t overload numeric classes with every possible calculation and formula. So what can we do? How about:


class Object
    def into expr = nil
        expr.nil? ? yield(self) : expr.to_proc.call(self)
    end
end

Now, snarfing Charles Duan’s code, we can write:


y = proc { |generator|
    proc { |x|
        proc { |*args|
            generator.call(x.call(x)).call(*args)
        }
    }.call(proc { |x|
        proc { |*args|
            generator.call(x.call(x)).call(*args)
        }
    })
}

factorial = y.call(proc { |callback|
    proc { |arg|
        if arg.zero?  then 1
        else          arg * callback.call(arg - 1)
        end
    }
})

Which lets us write:


5.succ.into(factorial).succ.odd?
    => true

I read this as “Start with five, get its successor, put that into the factorial proc, take the result’s successor, and answer whether it is odd.” The whole thing reads in one consistent style, you aren’t mixing left-to-right method chaining with right-to-left nesting functions. I wouldn’t go crazy with Object#into in a program, but if you have an expression that is predominately chaining methods, Object#into can make it consistent and improve its readability.

Function Composition

There is more than one way to skin a cat. If f(g(h(value))) is too constricting, we can compose functions instead of nesting them. So we can write:


class Proc
  def self.compose(f, g)
    lambda { |*args| f[g[*args]] }
  end
  def *(g) # Tom's origional composition operator
    Proc.compose(self, g)
  end
  def |(g) # The reverse composition operator, mimicing a pipe
    Proc.compose(g, self)
  end
end

plus1 = lambda { |n| n + 1 }
squared = lambda { |n| n * n }
minus1 = lambda { |n| n - 1 }

This allows us to write (minus1 | squared | plus1).call(5), which puts almost everything in left-to-right order. Hey, remember Object#into? Why don’t we try it?


5.into(minus1 | squared | plus1)

That saves us from writing 5.into(minus1).into(squared).into(plus1) if we find three instances of “into” a little noisy. Composing functions using * lets us maintain right-to-left order and composing functions with | lets us create left-to-right order when we are making a “pipeline” of expressions.

Summary

In the end, this is a very trivial idea: When an expression can be written so that it reads consistently from left-to-right or consistently from right-to-left, do so. The code will be easier to read.

Uh, yes, I am familiar with Smalltalk. I’m thinking that my opinion of my ability to make a joke far exceeds my actual ability: the phrase is meant as a pun on Edgar Rice Burroughs’s Barsoomian Tales, featuring the Warlord John Carter. But all that being said, regardless of how OO you want to get, I am not convinced that objects are responsible for every operation that can possibly be performed on them.

¶ 8:50 AM

Comments on “No Detail Too Small”:

This is where I'm really jealous of Haskell's easy functional composition:

let z = (+1) >>> fact >>> (+1) >>> odd
z 5

And I'm sure there's a way to get 5 in front of that, I just don't know what it is.

# posted by

Bill Mill : 10:45 AM

Bill:

Thanks for your comment! Does Haskell's point-free syntax permit even more remarkably concise composition?

# posted by

Reginald Braithwaite : 11:37 AM

Yeah, there's some really neat things you can do with it; I learned a lot from this reddit thread.

Another neat point is that (+1) is not a +1 function as commonly defined in lisp; it's the *actual* plus operator curried with a 1. Which is super cool.

What I really want is a duck-typed language with functional composition abilities like Haskell; I'm cooking up a blog on that.

# posted by

Bill Mill : 11:40 AM

Yeah, you can also write the following:

let z = odd . (+ 1) . fact . (+ 1)
z 5

(The dot meaning functional composition in Haskell, just in case you didn't know)

# posted by

cypher : 12:51 PM

So to play Smalltalk's advocate: why *shouldn't* we "overload numeric classes with every possible calculation and formula"? It drives me nuts that in Ruby, for example, numbers know how to square themselves (10 ** 2) but not how to take their own square root (Math.sqrt(100)).

Purely from a pragmatic software engineering standpoint: when you have so many different data types (small integers, large integers, floats, scaled decimals, complex numbers, amounts of money...) which respond to the same set of operations, it seems foolish not to take advantage of method dispatch to allow a different/optimized implementation of each operation for each type of numeric value.

Instead, what happens? You get awful hacks like this from complex.rb:

module Math
alias sqrt! sqrt
# Redefined to handle a Complex argument.
def sqrt(z)
if Complex.generic?(z)
if z >= 0
sqrt!(z)
else
Complex(0,sqrt!(-z))
end
else
if z.image < 0
sqrt(z.conjugate).conjugate
else
r = z.abs
x = z.real
Complex( sqrt!((r+x)/2), sqrt!((r-x)/2) )
end
end
end
end

Oh good - now what happens when someone else wants to redefine it, too?

# posted by

Avi : 4:09 PM

Avi:

I see a case that if we only have single-dispatch, and if the only dispatch mechanism is a method invocation, then classes should be heavyweight because the alternatives are worse.

And given that Ruby apes Smalltalk in many ways, I wouldn’t stay up nights being an angry blogger if factorial were added to Integer in the standard library.

But yes, I would prefer a more elegant way to put the less commonly used Integer functions elsewhere so that its “intellectual surface area” is smaller.

# posted by

Reginald Braithwaite : 4:24 PM

Reginald:

Things definitely change in a multiple-dispatch world; if we're talking about CLOS or Dylan, the design space looks quite different. And if we're talking about Java, which is IMO broken by not having open classes, then we're in trouble. But at least in the context of Ruby, Smalltalk, C# and Objective-C, it seems that maybe we agree :).

I'm a little suspicious of a concept like the "intellectual surface area" of Integer, though, because I think that is (and should be) a moving target depending on what packages you have loaded. The more relevant surface area seems to me to be at the package level: if I load a package, I have to know what new classes and methods it introduces, regardless of the class it adds them to. If I don't load that package (say I don't load a math package because I don't need factorial or sqrt) then I don't need to know about those methods at all.

# posted by

Avi : 7:01 PM

Avi:

I agree 100% with your suggestion that the important thing is which packages you load.

It is trivial to build a 'package' in Ruby that opens Integer and adds math methods to it. That gives you some of the advantages you mentioned, such as leveraging implementation optimizations, without turning the standard library into a swiss army knife of capabilities.

I have an unfinished post where I laud open classes while lamenting their "globality." I would love to be able to decorate Hash, Integer, Object, Proc, and lots of other things but only in the context of a specific class.

Take adding * to Proc for composition. What happens if my code does that but someone else's code adds * to Proc meaning produce the Cartesian product of the results of the two Procs?

I wish that it was possible for both pieces of code to happily co-exist.

# posted by

Reginald Braithwaite : 7:18 PM

It seems to me that this issue is solved fairly neatly in Scala via their "implicit conversion" feature.

What happens if my code does that but someone else's code adds * to Proc meaning produce the Cartesian product of the results of the two Procs?

Your code would define an implicit conversion to a "RegProc" which has a "*-for-composition" operator, and Avi's code would have the other implicit conversion to an "AviProc" whose * means the Cartesian product.

I wish that it was possible for both pieces of code to happily co-exist.

At least in this case, they could co-exist without further ado in separate parts of your application. In order to get them to coexist in the same scope, a little more work might be required (changing half of the call sites to disambiguate the intent).

# posted by

Douglas : 8:28 PM

Dave Thomas has another solution doing pipelines using Ruby1.9's fiber: http://pragdave.blogs.pragprog.com/pragdave/2007/12/pipelines-using.html

part2: http://pragdave.blogs.pragprog.com/pragdave/2008/01/pipelines-using.html

# posted by

Apple : 2:05 AM

The Smalltalker's answer would be that using * is a bad idea in both of those cases - much better to be explicit and use #compose: in the one case and #cartesianProduct: in the other. And if you end up with two packages that both define #cartesianProduct: in conflicting ways on the same class, well, then you have a bigger problem than namespacing will solve.

# posted by

Avi : 3:04 AM

Avi:

The issue of * vs. "compose" is interesting, but a much larger discussion of readability and trade-offs. Let's stick to the semantics. Even if both methods are named "compose" or "cartesianProduct," you can have two people write them such that they are broadly the same but differ in minor details, causing conflicts.

This will certainly be the case for anything non-trivial. For example, Ruby's Symbol#to_proc allows you to use a symbol representing a method with any arity, however all of the documented examples I found concerned methods taking no parameters.

It is included as part of Ruby on Rails and will be in Ruby 1.9, but otherwise if you want it you must copy and paste or roll your own.

What happens if someone rolls their own based on the informal examples and documents on the web? Their version would be incomplete and clash with anyone using &:merge or &:+.

We can blame them for a buggy implementation, but that bothers me: their implementation works for all of the code they wrote, how are we supposed to coördinate everyone's requirements for a method in an open class?

I don't have an answer, but I do stand by the thought that this is a problem with open classes: their global nature. Most of the time when I extend an open class, I am making changes that are really private to my use.

# posted by

Reginald Braithwaite : 7:35 AM

<< Home