Separating the concern of "what to do" from "how to do it quickly"

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Monday, July 07, 2008

Over the week-end, I put together a rudimentary rewrite-by-example feature for the rewrite gem (please be patient, I won’t be adding this feature to the gem until it has gone through a few more design iterations). The first thing I tried to do with it was simulating unhygienic macros:


Unhygienic.from {

    __to_receiver.andand.__to_message(__splat_parameters)

}.to {

    lambda { |andand_temp|
      andand_temp.__to_message(__splat_parameters) if andand_temp
    }.call(__to_receiver)

}

What this code produces is a sexp-rewriter that does a global search-and-replace. It looks for code like this:


Person.find(:first, ...).andand.friends(true)

And replaces it inline with:


lambda { |andand_temp|
    andand_temp.friends(true) if andand_temp
}.call(Person.find(:first, ...))

Declarative rewriting by example is a darn sight better than hand-written sexp manipulation:


def process_call(exp)
  exp.shift
  receiver_sexp = exp.first
  if matches_andand_invocation(receiver_sexp)
    exp.shift
    mono_parameter = Rewrite.gensym()
    s(:call, 
      s(:iter, 
        s(:fcall, :lambda), 
        s(:dasgn_curr, mono_parameter), 
        s(:if, 
          s(:call, s(:dvar, mono_parameter), :nil?), 
          s(:nil), 
          begin
            s(:call, 
              s(:dvar, mono_parameter), 
              *(exp.map { |inner| process_inner_expr inner })
            )
          ensure
            exp.clear
          end
        )
      ), 
      :call, 
      s(:array, 
        process_inner_expr(receiver_sexp[1])
      )
    )
  else
    begin
      s(:call,
        *(exp.map { |inner| process_inner_expr inner })
      )
    ensure
      exp.clear
    end
  end
end

But back to rewriting by example:


Unhygienic.from {

    __to_receiver.andand.__to_message(__splat_parameters)

}.to {

    lambda { |andand_temp|
      andand_temp.__to_message(__splat_parameters) if andand_temp
    }.call(__to_receiver)

}

As you can deduce, it is using some magic symbols. In the “from” part of the definition, __to_receiver and __to_message mean “Match something here and name the result receiver and message respectively,” while __splat_parameters means “Match a list of things here and name the result parameters.”

In the “to” part of the definitions, those magic symbols insert whatever was matched in the from. This is a crude approximation of how regular expressions capture things with () and insert them with $1..$n, only there is no way to capture any arbitrary sub-pattern. You can use any names you want, as long as they have the magic prefixes __to_ or __splat_. (Magic symbols are a blight upon all that is right and good with code, suggestions for a better way to express capturing by name gratefully solicited!)

Speaking of select, inject, and other higher-order functions, The Seasoned Schemer is devoted to the myriad uses of first class functions. This book is approachable and a delight to read, but the ideas are provocative and when you close the back cover you will be able to compose programs from functions in powerful new ways.

The example above is very much like a Lisp macro. In Lisp, everything is an sexp, so of course macro invocations are sexps just as function calls are sexps. In Ruby, some things look like function calls, some things look like method invocations. In this particular case, “Person.find(:first, …).andand.friends(true)” looks like a method invocation, but it actually behaves like a macro invocation.

Although it looks like a method, the rewriter version of #andand is not polymorphic. You can’t override #andand in any classes, just as you can’t override other syntactic constructions like “&&” or “!”. This bothers OO purists, however I am a member of the OO radical fringe who take purity to another level, a level where overriding functionality is not allowed to violate Liskov Equivalence.

(I personally do not care for the idea that something like #andand sometimes means one thing and sometimes means another, just as many people would completely freak out if !foo didn’t always mean “not foo.”)

beyond macros

Rewriting goes beyond adding new functions and verbs (I personally consider #andand to be an adverb). Basically, what we have here is an extremely weak version of XSLT for Ruby code. Match this, turn it into that. IMO, XSLT transformations is a much better analogy than macro expansion. Rewriters can match and replace fairly arbitrary expressions, not just implement things that look like function calls and method calls.

Consider this hypothetical example:


Unhygienic.from {

    __to_receiver.select { |__to_x| 
        __to_select_body 
    }.inject(__to_seed) { |__to_acc, __to_y| 
        __to_inject_body
    }

}.to {

    __to_receiver.inject(__to_seed) { |__to_acc, __to_x|
        if __to_select_body
            __to_y = __to_x
            __to_inject_body
        else
            __to_acc
        end
    }

}

This would transformation code like this:


heads_of_state = locations.select { |any_loc| 
    zips_in_this_state.include? any_loc.zip_code
}.inject(0) { |heads, loc_in_state|
    heads + loc_in_state.head_count
}

Into this:


heads_of_state = locations.inject({}) { |heads, any_loc|
    if zips_in_this_state.include? any_loc.zip_code
        loc_in_state = any_loc
        heads + loc_in_state.head_count
    else
        heads
    end
}

This is an example of an optimization.

For a large class of expressions chaining a select and an inject (the ones that don’t rely on side effects), this transformation retains the original semantics while rewriting the code to only traverse the collection once and to eliminate the creation of an intermediate collection.

Of course, compilers do this kind of thing all the time for many types of optimizations, so it’s tempting to wait for someone else to write a sufficiently smart compiler that can figure these things out. There are two troubles with waiting for someone else to do it. First, we might be that someone—maybe we’re the one who ought to Just Do It, and waiting for someone else won’t work because nobody else is going to do it.

Second, many problems like this are intractable in the general case. It’s hard (in the mathematical sense) to know when select {…. }.inject { … } can be transformed like this in an imperative language without accidentally stepping on some hidden side-effect. But just because it’s hard in the general case doesn’t mean it isn’t easy in the specific case. For example, you might be the sort of person who never knowingly relies on side effects in select and inject expressions.

So you could use this optimization, while a compiler-level optimization would be a disaster: even if 99.9% of the code out there wouldn’t break, the programmers behind the 0.1% of the broken programs would be furiously blogging about how Ruby wasn’t ready for the Enterprise.

optimizing your code

The joy and the pain of optimizing your code is that you don’t need rewrite to perform that optimization. The joy is that if you discover chaining select and inject is a performance hog somewhere in your code, you simply rewrite the code yourself.

The pain is that the code is no longer in the form you originally decided best represents its intent. In the trivial example I gave above, a rewritten version looks reasonable, especially if you rewrite it with #each in an imperative style:


heads_of_state = 0
locations.each { |any_loc|
    if zips_in_this_state.include? any_loc.zip_code
        heads_of_state += any_loc.head_count
    end
}

Now I’m not going to say that this is necessarily more or less readable than:


heads_of_state = locations.select { |any_loc| 
    zips_in_this_state.include? any_loc.zip_code
}.inject(0) { |heads, loc_in_state|
    heads + loc_in_state.head_count
}

Some people actively dislike using #select and #inject, so they might feel the #each version is better. For once, let’s talk about something other than a bike shed. Let’s focus on the fact that foo.select {…}.inject {…} says one thing: “Filter this collection using this predicate, and then fold the result as follows.” Whereas foo.each {…} says “Iterate over this collection doing the following thing with each element.”

If you wrote this as a #select/#inject pair, you might have a good reason for doing so. Perhaps most of your program is written in a functional style. Perhaps you like to signal that the there are no side effects in those snippets of code and your team share this understanding of how selects and injects are written.

Granting that you believe that #select and #inject do a better job of communicating the code’s intent to your fellow team members, it’s a win to optimize the code (in the compiler or using a rewriter) behind the scenes rather than rewrite it yourself. The code retains its semantics and the form you have decided best expresses its intent, while using less memory and running faster.

separation of concerns

What we have just done with our trivial example is separated two concerns: The concerns of how to best express an algorithm and the concern of how to best implement an algorithm. If for whatever reason—furious hand-waving to avoid arguing how to write loops—we believe that the best way to express a certain algorithm for readability is not the best way to express a certain algorithm for performance, we have two separate concerns: How to write readable code and how to write ~~performant~~ fast code.

Doesn’t it make sense to separate those concerns? So that the code explaining what the algorithm is supposed to do is in one place and the code expressing how to make such things go fast is in another?

This is pure speculation here, but I am conjecturing that being able to rewrite arbitrary snippets of code could be used like a compiler optimization directive. When debugging, you don’t rewrite the code. But when things have stabilized and you need to tweak performance, instead of rewriting the code, you use rewriters to do it for you, separating the concern of “what to do” from the concern of “how to do it quickly.”

¶ 7:45 AM

Comments on “Separating the concern of "what to do" from "how to do it quickly"”:

Are you sure you wouldn't rather just be programming in Haskell? First you come up with andand which is just the Maybe monad and now you're reinventing rewrite rules. Of course I'm stuck with C++ at work and tend to keep cramming approximations of things I use in Haskell into my C++ code so maybe I shouldn't be throwing stones.

# posted by

brett : 5:27 PM

Brett:

Andand is not the Maybe monad, Maybe is considerably more generalized.

Now, I am quite aware of rewrite rules in Haskell, I tried to do a fairly comprehensive review of everything that looked remotely like code generation features. It is no coïncidence that this gem is called rewrite.

That being said, I am not particularly interested in switching to Haskell for most of the same reasons I don’t return to Scheme every time I find something irritating about Ruby.

# posted by

Reginald Braithwaite : 6:20 PM

The optimization you are describing is called deforestation and was pioneered by Philip Wadler. There's plenty of related work, Positive supercompilation, Lightweight fusion by fixed point promotion and many others.

Notice that the shortcut fusion in GHC (implemented through the rewrite rules) is a different beast, but it also removes some intermediate structures.

# posted by

Peter : 7:57 AM

Peter:

Thanks, I recall reading about deforestation some time ago, and then forgotten all about it.

As described here, it is fairly simple to implement using rewriting. I am very, very interested in building rewrite up to handle much more complicated transformations.

In my hand-written rewriters there is a facility for merging let lambdas, something like a special-case of lambda=lifting. I would like to be able to express things like lambda-lifting using a declarative be example" style rather than writing sexp transformations imperatively.

# posted by

Reginald Braithwaite : 8:36 AM

Jeez, I kind of come across like a jerk in that earlier comment. Sorry about that. Guess I shouldn't be trying to fire off a quick comment while waiting for a compilation to finish since C++ sometimes makes me a bit cranky.

I agree that Maybe in Haskell is more generalized than andand but I always thought that andand captured the essence of what Maybe is good for in the monadic sense, i.e. chaining a bunch of Maybe returning functions together and short-circuiting when you get Nothing (or nil in andand's case).

Another thing that your rewriting above does is eliminate a space leak. Try running

[1,2,3,4,5,6].select{|x| puts "s#{x}"; x % 2 == 0}.map{|x| puts "m#{x}"; x + 1}

in irb and you get all the s1, s2, etc. messages before you get any of the m2, m4 messages printed out. So select has to generate the entire list before map can consume it. By fusing select and map you not only avoid unneeded function calls but you avoid generating the intermediate list as well. I suppose another solution would be to add lazy versions of select and map to Enumerable that in turn return a proxy object that applies the filtering and mapping lazily in order to avoid generating the intermediate list. But then the list interface is probably fat enough already without adding lazy analogs for most of the methods.

# posted by

brett : 12:46 PM

Brett:

You did not come across as a jerk! In fact, I feel bad that I did not quote more of the things I have read and absorbed over the years.

I think what your message (and Peter's message) points out is the importance of research, by which I mean, reading what has already been written.

Going off on a tangent, what I expect from university graduates is to know about things like Monads and Deforestation and Lambda Lifting and what-not when they arrive for their first day on the Java Job.

Our industry has a bad habit of either ignoring its own history or repeating it, badly.

So thanks for the input!

Now to the optimization. Yes, the issue of eliminating the intermediate list is very important:

For a large class of expressions chaining a select and an inject (the ones that don’t rely on side effects), this transformation retains the original semantics while rewriting the code to only traverse the collection once and to eliminate the creation of an intermediate collection.

You are right that the semantics of expressions with side effects are changed because of the interleaved ordering. Correct me if I’m wrong, but isn’t this related to the Array Monad?

# posted by

Reginald Braithwaite : 1:05 PM

Oops, missed that bit about "to eliminate the creation of an intermediate collection". Oh well, now you've got an example of what that means for free.

Hmmm, not sure that I know of an Array monad. There's the list monad but that deals with non-deterministic computations and doesn't seem to have much bearing here. There's also the ST monad that allows strict stateful computations including operating on mutable arrays (here's and example). In the ST monad, and by extension the IO monad which is built on top of the of the ST monad, there's a world object that is passed around in the background creating a data dependency between the monad actions. This makes sure that they are run in sequence, not something you can take for granted in a lazy by default language. The updating of the world object, actually each action creates a new world object that is passed to the next action, allows the use of mutable variables in a language that doesn't actually support them. The upshot of all this structure is that you can restrict side-effects to actions run in the ST monad and know that the rest of your program is side-effect free, unsafePerformIO and its ilk notwithstanding. This would fix your rewrite issues above since you would be sure when it was OK to modify things without breaking side-effect reliant code. Of course retrofitting something like that into Ruby would a tall order at this point.

# posted by

brett : 11:55 PM

<< Home