Scientists Announce Empirical Evidence for Greenspun's Tenth Rule
It's
Friday, so let's have some fun.
Sinclair Schuller posted an interesting article,
How to Rewrite Standard Recursion through a State Stack & Iteration (the original page has been removed). In his own words:
Recursion, as a construct, is quite beautiful. It offers an elegant means of achieving an algorithmic goal and is used in everything from mathematics to text processing and data structure manipulation. The problem is, using it in practice through today's popular languages (such as my favorite, C#) can prove to be a disaster.
The problem, quite simply, is that many
flavours of Blub popular languages have a limitation on how deeply you can make function calls before returning. That limit is far smaller than the limitations placed on the size of data set you can manipulate.
Recursive functions often have the property that they require storage O
n. Therefore, if the number of function calls you can make before returning is fixed, the size of data set you can manipulate is fixed.
Yes, I said data set. That's very interesting, especially when you consider that Sinclair's example, the factorial function, doesn't really manipulate what most people would think of as a data set. But it really is, because if you were to think of
100!
as a macro instead of a function, you would end up with
1 x 2 x 3 x 4 x 5...
x 99 x 100
. So there really is a set of 100 datums to manipulate.
Now back to Sinclair's example. He shows how to emulate a programming language's call stack. This is
Greenspunning at its finest:
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
Sinclair's transformation implements a better programming language (like Lisp) that does not have an arbitrary limit on the size of the stack it uses for function calls. This used to be a neat programming challenge. Old-timers will remember when certain languages (BASIC, FORTRAN) did not support re-entrant functions or procedures. So if you wanted to do something recursively, you had to Greenspun. (re-)Discovering this technique in order to write a "Towers of Hanoi" program was an early "Aha!" moment for me.
I call such a transformation a pseudo-iteration. It isn't really iterative, as the presence of a stack shows. Instead, it is emulating or interpreting a recursive algorithm. Sinclair also shows how to optimize the algorithm so that it always executes in constant space (O
1).
I consider this a variation on Greenspunning where instead of implementing half of Common Lisp, we implement half of
Scheme. Scheme is a member of the Lisp family that can optimize tail-recursive functions. (So can most Common Lisp implementations. Maybe. As Wikipedia says:
The Scheme standards documents require tail-call optimization, which the CL standard does not. Most CL implementations do offer tail-call optimization, although often only when the programmer uses an optimization directive.)
Let's look at what's going on with some Ruby. The first thing we need is a factorial function:
def factorial n # whole numbers only
(([0,1].include? n) && 1) || (n * (factorial (n - 1)))
end
(If you aren’t familiar with Ruby, what you need to know is that the
&&
operator returns its last argument if all of its arguments are truthy (neither
nil
nor
false
) and the
||
operator returns its first truthy argument.)
Okay, so the first thing we do is convert this factorial function into tail recursive form:
def factorial n, acc = 1
((0 == n) && acc) || (factorial (n - 1), (n * acc))
end
As you already know (or have learned by following the links to an explanation of
tail call optimization), this version of factorial has its call in
tail position. That means that every time it calls itself (just once), it returns the result of the call directly. The first version of the function didn't return the result directly; it called itself and then performed a multiply on the result.
Because it does no further work with the result of the recursive call, there is
no need to store the calling parameters or the result on a stack. What you do is this. You set up a loop. Inside each iteration you perturb the parameters you have received. When the loop is done, you return some expression on the final set of parameters.
As it happens, factorial is easy in this respect because the final return is simply the accumulator we had been carrying around as a parameter in the tail recursive algorithm.
Let's perform the conversion, using this approach to omit the stack:
def factorial n
acc = 1
until 0 == n
acc *= n
n -= 1
end
acc
end
We have progressed beyond half of Common Lisp into half of Scheme. Now, like a properly optimizing environment, we have greater performance and we require constant memory.
Only we have done by hand the work that a machine (you know, a labour-saving device) could have done for us, had we used a programming language of 1970s vintage.
How quaint.
Update:
C# and Java Leak
Why is this kind of manual transformation necessary? In the end, it is because the so-called modern programming languages are so obsessed with copying popular languages that came before them (like C and C++) that they copied the bugs as well as the features.
The present an abstraction, function calling, but the abstraction leaks badly: you can't actually use it for non-trivial data sets. To me, this is kith and kin with the leaky abstraction known as primitive data types.
The only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don't save us time learning.
And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.
Joel Spolsky
I am deeply dissappointed that programmers need to do this kind of work by hand in this day and age. We have known how to release function calls from the tyranny of a fixed stack for decades. We have known how to optimize tail calls for decades.
No matter how simple and obvious these transformations are, every time we do this by hand we are sidetracked from ading real value through innovation. Every time we do this by hand, we add some risk of defect to our software. Every time we do this by hand, we obfuscate the core expression of our intent behind the
leaky abstraction.