raganwald
(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Monday, July 07, 2008
  Separating the concern of "what to do" from "how to do it quickly"


Over the week-end, I put together a rudimentary rewrite-by-example feature for the rewrite gem (please be patient, I won’t be adding this feature to the gem until it has gone through a few more design iterations). The first thing I tried to do with it was simulating unhygienic macros:

Unhygienic.from {

__to_receiver.andand.__to_message(__splat_parameters)

}.to {

lambda { |andand_temp|
andand_temp.__to_message(__splat_parameters) if andand_temp
}.call(__to_receiver)

}

What this code produces is a sexp-rewriter that does a global search-and-replace. It looks for code like this:

Person.find(:first, ...).andand.friends(true)

And replaces it inline with:

lambda { |andand_temp|
andand_temp.friends(true) if andand_temp
}.call(Person.find(:first, ...))

Declarative rewriting by example is a darn sight better than hand-written sexp manipulation:

def process_call(exp)
exp.shift
receiver_sexp = exp.first
if matches_andand_invocation(receiver_sexp)
exp.shift
mono_parameter = Rewrite.gensym()
s(:call,
s(:iter,
s(:fcall, :lambda),
s(:dasgn_curr, mono_parameter),
s(:if,
s(:call, s(:dvar, mono_parameter), :nil?),
s(:nil),
begin
s(:call,
s(:dvar, mono_parameter),
*(exp.map { |inner| process_inner_expr inner })
)
ensure
exp.clear
end
)
),
:call,
s(:array,
process_inner_expr(receiver_sexp[1])
)
)
else
begin
s(:call,
*(exp.map { |inner| process_inner_expr inner })
)
ensure
exp.clear
end
end
end

But back to rewriting by example:

Unhygienic.from {

__to_receiver.andand.__to_message(__splat_parameters)

}.to {

lambda { |andand_temp|
andand_temp.__to_message(__splat_parameters) if andand_temp
}.call(__to_receiver)

}

As you can deduce, it is using some magic symbols. In the “from” part of the definition, __to_receiver and __to_message mean “Match something here and name the result receiver and message respectively,” while __splat_parameters means “Match a list of things here and name the result parameters.”

In the “to” part of the definitions, those magic symbols insert whatever was matched in the from. This is a crude approximation of how regular expressions capture things with () and insert them with $1..$n, only there is no way to capture any arbitrary sub-pattern. You can use any names you want, as long as they have the magic prefixes __to_ or __splat_. (Magic symbols are a blight upon all that is right and good with code, suggestions for a better way to express capturing by name gratefully solicited!)



Speaking of select, inject, and other higher-order functions, The Seasoned Schemer is devoted to the myriad uses of first class functions. This book is approachable and a delight to read, but the ideas are provocative and when you close the back cover you will be able to compose programs from functions in powerful new ways.

The example above is very much like a Lisp macro. In Lisp, everything is an sexp, so of course macro invocations are sexps just as function calls are sexps. In Ruby, some things look like function calls, some things look like method invocations. In this particular case, “Person.find(:first, …).andand.friends(true)” looks like a method invocation, but it actually behaves like a macro invocation.

Although it looks like a method, the rewriter version of #andand is not polymorphic. You can’t override #andand in any classes, just as you can’t override other syntactic constructions like “&&” or “!”. This bothers OO purists, however I am a member of the OO radical fringe who take purity to another level, a level where overriding functionality is not allowed to violate Liskov Equivalence.

(I personally do not care for the idea that something like #andand sometimes means one thing and sometimes means another, just as many people would completely freak out if !foo didn’t always mean “not foo.”)

beyond macros

Rewriting goes beyond adding new functions and verbs (I personally consider #andand to be an adverb). Basically, what we have here is an extremely weak version of XSLT for Ruby code. Match this, turn it into that. IMO, XSLT transformations is a much better analogy than macro expansion. Rewriters can match and replace fairly arbitrary expressions, not just implement things that look like function calls and method calls.

Consider this hypothetical example:

Unhygienic.from {

__to_receiver.select { |__to_x|
__to_select_body
}.inject(__to_seed) { |__to_acc, __to_y|
__to_inject_body
}

}.to {

__to_receiver.inject(__to_seed) { |__to_acc, __to_x|
if __to_select_body
__to_y = __to_x
__to_inject_body
else
__to_acc
end
}

}

This would transformation code like this:

heads_of_state = locations.select { |any_loc|
zips_in_this_state.include? any_loc.zip_code
}.inject(0) { |heads, loc_in_state|
heads + loc_in_state.head_count
}

Into this:

heads_of_state = locations.inject({}) { |heads, any_loc|
if zips_in_this_state.include? any_loc.zip_code
loc_in_state = any_loc
heads + loc_in_state.head_count
else
heads
end
}

This is an example of an optimization.

For a large class of expressions chaining a select and an inject (the ones that don’t rely on side effects), this transformation retains the original semantics while rewriting the code to only traverse the collection once and to eliminate the creation of an intermediate collection.

Of course, compilers do this kind of thing all the time for many types of optimizations, so it’s tempting to wait for someone else to write a sufficiently smart compiler that can figure these things out. There are two troubles with waiting for someone else to do it. First, we might be that someone—maybe we’re the one who ought to Just Do It, and waiting for someone else won’t work because nobody else is going to do it.

Second, many problems like this are intractable in the general case. It’s hard (in the mathematical sense) to know when select {…. }.inject { … } can be transformed like this in an imperative language without accidentally stepping on some hidden side-effect. But just because it’s hard in the general case doesn’t mean it isn’t easy in the specific case. For example, you might be the sort of person who never knowingly relies on side effects in select and inject expressions.

So you could use this optimization, while a compiler-level optimization would be a disaster: even if 99.9% of the code out there wouldn’t break, the programmers behind the 0.1% of the broken programs would be furiously blogging about how Ruby wasn’t ready for the Enterprise.

optimizing your code

The joy and the pain of optimizing your code is that you don’t need rewrite to perform that optimization. The joy is that if you discover chaining select and inject is a performance hog somewhere in your code, you simply rewrite the code yourself.

The pain is that the code is no longer in the form you originally decided best represents its intent. In the trivial example I gave above, a rewritten version looks reasonable, especially if you rewrite it with #each in an imperative style:

heads_of_state = 0
locations.each { |any_loc|
if zips_in_this_state.include? any_loc.zip_code
heads_of_state += any_loc.head_count
end
}

Now I’m not going to say that this is necessarily more or less readable than:

heads_of_state = locations.select { |any_loc|
zips_in_this_state.include? any_loc.zip_code
}.inject(0) { |heads, loc_in_state|
heads + loc_in_state.head_count
}

Some people actively dislike using #select and #inject, so they might feel the #each version is better. For once, let’s talk about something other than a bike shed. Let’s focus on the fact that foo.select {…}.inject {…} says one thing: “Filter this collection using this predicate, and then fold the result as follows.” Whereas foo.each {…} says “Iterate over this collection doing the following thing with each element.”

If you wrote this as a #select/#inject pair, you might have a good reason for doing so. Perhaps most of your program is written in a functional style. Perhaps you like to signal that the there are no side effects in those snippets of code and your team share this understanding of how selects and injects are written.

Granting that you believe that #select and #inject do a better job of communicating the code’s intent to your fellow team members, it’s a win to optimize the code (in the compiler or using a rewriter) behind the scenes rather than rewrite it yourself. The code retains its semantics and the form you have decided best expresses its intent, while using less memory and running faster.

separation of concerns

What we have just done with our trivial example is separated two concerns: The concerns of how to best express an algorithm and the concern of how to best implement an algorithm. If for whatever reason—furious hand-waving to avoid arguing how to write loops—we believe that the best way to express a certain algorithm for readability is not the best way to express a certain algorithm for performance, we have two separate concerns: How to write readable code and how to write performant fast code.

Doesn’t it make sense to separate those concerns? So that the code explaining what the algorithm is supposed to do is in one place and the code expressing how to make such things go fast is in another?

This is pure speculation here, but I am conjecturing that being able to rewrite arbitrary snippets of code could be used like a compiler optimization directive. When debugging, you don’t rewrite the code. But when things have stabilized and you need to tweak performance, instead of rewriting the code, you use rewriters to do it for you, separating the concern of “what to do” from the concern of “how to do it quickly.”
 

Comments on “Separating the concern of "what to do" from "how to do it quickly":
Are you sure you wouldn't rather just be programming in Haskell? First you come up with andand which is just the Maybe monad and now you're reinventing rewrite rules. Of course I'm stuck with C++ at work and tend to keep cramming approximations of things I use in Haskell into my C++ code so maybe I shouldn't be throwing stones.
 
Brett:

Andand is not the Maybe monad, Maybe is considerably more generalized.

Now, I am quite aware of rewrite rules in Haskell, I tried to do a fairly comprehensive review of everything that looked remotely like code generation features. It is no coïncidence that this gem is called rewrite.

That being said, I am not particularly interested in switching to Haskell for most of the same reasons I don’t return to Scheme every time I find something irritating about Ruby.
 
The optimization you are describing is called deforestation and was pioneered by Philip Wadler. There's plenty of related work, Positive supercompilation, Lightweight fusion by fixed point promotion and many others.

Notice that the shortcut fusion in GHC (implemented through the rewrite rules) is a different beast, but it also removes some intermediate structures.
 
Peter:

Thanks, I recall reading about deforestation some time ago, and then forgotten all about it.

As described here, it is fairly simple to implement using rewriting. I am very, very interested in building rewrite up to handle much more complicated transformations.

In my hand-written rewriters there is a facility for merging let lambdas, something like a special-case of lambda=lifting. I would like to be able to express things like lambda-lifting using a declarative be example" style rather than writing sexp transformations imperatively.
 
Jeez, I kind of come across like a jerk in that earlier comment. Sorry about that. Guess I shouldn't be trying to fire off a quick comment while waiting for a compilation to finish since C++ sometimes makes me a bit cranky.

I agree that Maybe in Haskell is more generalized than andand but I always thought that andand captured the essence of what Maybe is good for in the monadic sense, i.e. chaining a bunch of Maybe returning functions together and short-circuiting when you get Nothing (or nil in andand's case).

Another thing that your rewriting above does is eliminate a space leak. Try running

[1,2,3,4,5,6].select{|x| puts "s#{x}"; x % 2 == 0}.map{|x| puts "m#{x}"; x + 1}

in irb and you get all the s1, s2, etc. messages before you get any of the m2, m4 messages printed out. So select has to generate the entire list before map can consume it. By fusing select and map you not only avoid unneeded function calls but you avoid generating the intermediate list as well. I suppose another solution would be to add lazy versions of select and map to Enumerable that in turn return a proxy object that applies the filtering and mapping lazily in order to avoid generating the intermediate list. But then the list interface is probably fat enough already without adding lazy analogs for most of the methods.
 
Brett:

You did not come across as a jerk! In fact, I feel bad that I did not quote more of the things I have read and absorbed over the years.

I think what your message (and Peter's message) points out is the importance of research, by which I mean, reading what has already been written.

Going off on a tangent, what I expect from university graduates is to know about things like Monads and Deforestation and Lambda Lifting and what-not when they arrive for their first day on the Java Job.

Our industry has a bad habit of either ignoring its own history or repeating it, badly.

So thanks for the input!

Now to the optimization. Yes, the issue of eliminating the intermediate list is very important:

For a large class of expressions chaining a select and an inject (the ones that don’t rely on side effects), this transformation retains the original semantics while rewriting the code to only traverse the collection once and to eliminate the creation of an intermediate collection.

You are right that the semantics of expressions with side effects are changed because of the interleaved ordering. Correct me if I’m wrong, but isn’t this related to the Array Monad?
 
Oops, missed that bit about "to eliminate the creation of an intermediate collection". Oh well, now you've got an example of what that means for free.

Hmmm, not sure that I know of an Array monad. There's the list monad but that deals with non-deterministic computations and doesn't seem to have much bearing here. There's also the ST monad that allows strict stateful computations including operating on mutable arrays (here's and example). In the ST monad, and by extension the IO monad which is built on top of the of the ST monad, there's a world object that is passed around in the background creating a data dependency between the monad actions. This makes sure that they are run in sequence, not something you can take for granted in a lazy by default language. The updating of the world object, actually each action creates a new world object that is passed to the next action, allows the use of mutable variables in a language that doesn't actually support them. The upshot of all this structure is that you can restrict side-effects to actions run in the ST monad and know that the rest of your program is side-effect free, unsafePerformIO and its ilk notwithstanding. This would fix your rewrite issues above since you would be sure when it was OK to modify things without breaking side-effect reliant code. Of course retrofitting something like that into Ruby would a tall order at this point.
 




<< Home
Reg Braithwaite


Recent Writing
Homoiconic Technical Writing / raganwald.posterous.com

Books
What I‘ve Learned From Failure / Kestrels, Quirky Birds, and Hopeless Egocentricity

Share
rewrite_rails / andand / unfold.rb / string_to_proc.rb / dsl_and_let.rb / comprehension.rb / lazy_lists.rb

Beauty
IS-STRICTLY-EQUIVALENT-TO-A / Spaghetti-Western Coding / Golf is a good program spoiled / Programming conventions as signals / Not all functions should be object methods

The Not So Big Software Design / Writing programs for people to read / Why Why Functional Programming Matters Matters / But Y would I want to do a thing like this?

Work
The single most important thing you must do to improve your programming career / The Naïve Approach to Hiring People / No Disrespect / Take control of your interview / Three tips for getting a job through a recruiter / My favourite interview question

Management
Exception Handling in Software Development / What if powerful languages and idioms only work for small teams? / Bricks / Which theory fits the evidence? / Still failing, still learning / What I’ve learned from failure

Notation
The unary ampersand in Ruby / (1..100).inject(&:+) / The challenge of teaching yourself a programming language / The significance of the meta-circular interpreter / Block-Structured Javascript / Haskell, Ruby and Infinity / Closures and Higher-Order Functions

Opinion
Why Apple is more expensive than Amazon / Why we are the biggest obstacles to our own growth / Is software the documentation of business process mistakes? / We have lost control of the apparatus / What I’ve Learned From Sales I, II, III

Whimsey
The Narcissism of Small Code Differences / Billy Martin’s Technique for Managing his Manager / Three stories about The Tao / Programming Language Stories / Why You Need a Degree to Work For BigCo

History
06/04 / 07/04 / 08/04 / 09/04 / 10/04 / 11/04 / 12/04 / 01/05 / 02/05 / 03/05 / 04/05 / 06/05 / 07/05 / 08/05 / 09/05 / 10/05 / 11/05 / 01/06 / 02/06 / 03/06 / 04/06 / 05/06 / 06/06 / 07/06 / 08/06 / 09/06 / 10/06 / 11/06 / 12/06 / 01/07 / 02/07 / 03/07 / 04/07 / 05/07 / 06/07 / 07/07 / 08/07 / 09/07 / 10/07 / 11/07 / 12/07 / 01/08 / 02/08 / 03/08 / 04/08 / 05/08 / 06/08 / 07/08 /