Separating the concern of "what to do" from "how to do it quickly"
Over the week-end, I put together a rudimentary
rewrite-by-example feature for the rewrite gem (please be patient, I won’t be adding this feature to the gem until it has gone through a few more design iterations). The first thing I tried to do with it was simulating
unhygienic macros:
Unhygienic.from {
__to_receiver.andand.__to_message(__splat_parameters)
}.to {
lambda { |andand_temp|
andand_temp.__to_message(__splat_parameters) if andand_temp
}.call(__to_receiver)
}
What this code produces is a sexp-rewriter that does a global search-and-replace. It looks for code like this:
Person.find(:first, ...).andand.friends(true)
And replaces it inline with:
lambda { |andand_temp|
andand_temp.friends(true) if andand_temp
}.call(Person.find(:first, ...))
Declarative rewriting by example is a darn sight better than hand-written sexp manipulation:
def process_call(exp)
exp.shift
receiver_sexp = exp.first
if matches_andand_invocation(receiver_sexp)
exp.shift
mono_parameter = Rewrite.gensym()
s(:call,
s(:iter,
s(:fcall, :lambda),
s(:dasgn_curr, mono_parameter),
s(:if,
s(:call, s(:dvar, mono_parameter), :nil?),
s(:nil),
begin
s(:call,
s(:dvar, mono_parameter),
*(exp.map { |inner| process_inner_expr inner })
)
ensure
exp.clear
end
)
),
:call,
s(:array,
process_inner_expr(receiver_sexp[1])
)
)
else
begin
s(:call,
*(exp.map { |inner| process_inner_expr inner })
)
ensure
exp.clear
end
end
end
But back to rewriting by example:
Unhygienic.from {
__to_receiver.andand.__to_message(__splat_parameters)
}.to {
lambda { |andand_temp|
andand_temp.__to_message(__splat_parameters) if andand_temp
}.call(__to_receiver)
}
As you can deduce, it is using some magic symbols. In the “from” part of the definition,
__to_receiver and
__to_message mean “Match something here and name the result
receiver and
message respectively,” while
__splat_parameters means “Match a list of things here and name the result
parameters.”
In the “to” part of the definitions, those magic symbols insert whatever was matched in the from. This is a crude approximation of how regular expressions capture things with () and insert them with $1..$n, only there is no way to capture any arbitrary sub-pattern. You can use any names you want, as long as they have the magic prefixes
__to_ or
__splat_. (Magic symbols are a blight upon all that is right and good with code, suggestions for a better way to express capturing by name gratefully solicited!)
Speaking of select, inject, and other higher-order functions, The Seasoned Schemer is devoted to the myriad uses of first class functions. This book is approachable and a delight to read, but the ideas are provocative and when you close the back cover you will be able to compose programs from functions in powerful new ways.
The example above is very much like a Lisp macro. In Lisp, everything is an sexp, so of course macro invocations are sexps just as function calls are sexps. In Ruby, some things look like function calls, some things look like method invocations. In this particular case, “Person.find(:first, …).andand.friends(true)” looks like a method invocation, but it actually behaves like a macro invocation.
Although it looks like a method, the rewriter version of #andand is not polymorphic. You can’t override #andand in any classes, just as you can’t override other syntactic constructions like “&&” or “!”. This bothers OO purists, however I am a member of the OO radical fringe who take purity to another level, a level where
overriding functionality is not allowed to violate Liskov Equivalence.
(I personally do not care for the idea that something like #andand sometimes means one thing and sometimes means another, just as many people would completely freak out if !foo didn’t always mean “not foo.”)
beyond macrosRewriting goes beyond adding new functions and verbs (I personally consider #andand to be an adverb). Basically, what we have here is an extremely weak version of XSLT for Ruby code. Match this, turn it into that. IMO, XSLT transformations is a much better analogy than macro expansion. Rewriters can match and replace fairly arbitrary expressions, not just implement things that look like function calls and method calls.
Consider this hypothetical example:
Unhygienic.from {
__to_receiver.select { |__to_x|
__to_select_body
}.inject(__to_seed) { |__to_acc, __to_y|
__to_inject_body
}
}.to {
__to_receiver.inject(__to_seed) { |__to_acc, __to_x|
if __to_select_body
__to_y = __to_x
__to_inject_body
else
__to_acc
end
}
}
This would transformation code like this:
heads_of_state = locations.select { |any_loc|
zips_in_this_state.include? any_loc.zip_code
}.inject(0) { |heads, loc_in_state|
heads + loc_in_state.head_count
}
Into this:
heads_of_state = locations.inject({}) { |heads, any_loc|
if zips_in_this_state.include? any_loc.zip_code
loc_in_state = any_loc
heads + loc_in_state.head_count
else
heads
end
}
This is an example of an
optimization.
For a large class of expressions chaining a select and an inject (the ones that don’t rely on side effects), this transformation retains the original semantics while rewriting the code to only traverse the collection once and to eliminate the creation of an intermediate collection.
Of course, compilers do this kind of thing all the time for many types of optimizations, so it’s tempting to wait for someone else to write a
sufficiently smart compiler that can figure these things out. There are two troubles with waiting for someone else to do it. First, we might be that someone—maybe we’re the one who ought to Just Do It, and waiting for someone else won’t work because nobody else is going to do it.
Second, many problems like this are intractable in the general case. It’s hard (in the mathematical sense) to know when select {…. }.inject { … } can be transformed like this in an imperative language without accidentally stepping on some hidden side-effect. But just because it’s hard in the general case doesn’t mean it isn’t easy in the specific case. For example, you might be the sort of person who
never knowingly relies on side effects in select and inject expressions.
So
you could use this optimization, while a compiler-level optimization would be a disaster: even if 99.9% of the code out there wouldn’t break, the programmers behind the 0.1% of the broken programs would be furiously blogging about how Ruby wasn’t ready for the Enterprise.
optimizing your codeThe joy and the pain of optimizing your code is that you don’t need rewrite to perform that optimization. The joy is that if you discover chaining select and inject is a performance hog somewhere in your code, you simply rewrite the code yourself.
The pain is that the code is no longer in the form you originally decided best represents its intent. In the trivial example I gave above, a rewritten version looks reasonable, especially if you rewrite it with #each in an imperative style:
heads_of_state = 0
locations.each { |any_loc|
if zips_in_this_state.include? any_loc.zip_code
heads_of_state += any_loc.head_count
end
}
Now I’m not going to say that this is necessarily more or less readable than:
heads_of_state = locations.select { |any_loc|
zips_in_this_state.include? any_loc.zip_code
}.inject(0) { |heads, loc_in_state|
heads + loc_in_state.head_count
}
Some people actively dislike using #select and #inject, so they might feel the #each version is better. For once, let’s talk about something other than a
bike shed. Let’s focus on the fact that foo.select {…}.inject {…} says one thing: “Filter this collection using this predicate, and then fold the result as follows.” Whereas foo.each {…} says “Iterate over this collection doing the following thing with each element.”
If you wrote this as a #select/#inject pair, you might have a good reason for doing so. Perhaps most of your program is written in a functional style. Perhaps you like to signal that the there are no side effects in those snippets of code and your team share this understanding of how selects and injects are written.
Granting that you believe that #select and #inject do a better job of communicating the code’s intent to your fellow team members, it’s a win to optimize the code (in the compiler or using a rewriter) behind the scenes rather than rewrite it yourself. The code retains its semantics and the form you have decided best expresses its intent, while using less memory and running faster.
separation of concernsWhat we have just done with our trivial example is separated two concerns: The concerns of how to best express an algorithm and the concern of how to best implement an algorithm. If for whatever reason—furious hand-waving to avoid arguing how to write loops—we believe that the best way to express a certain algorithm for readability is not the best way to express a certain algorithm for performance, we have two separate concerns: How to write readable code and how to write
performant fast code.
Doesn’t it make sense to separate those concerns? So that the code explaining what the algorithm is supposed to do is in one place and the code expressing how to make such things go fast is in another?
This is pure speculation here, but I am conjecturing that being able to rewrite arbitrary snippets of code could be used like a compiler optimization directive. When debugging, you don’t rewrite the code. But when things have stabilized and you need to tweak performance, instead of rewriting the code, you use rewriters to do it for you, separating the concern of “what to do” from the concern of “how to do it quickly.”