Macros, Hygiene, and Call By Name in Ruby

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Sunday, June 22, 2008

Never send a macro to do a function’s job.

Sound advice, however just because functions (or methods) are better than macros for the things they both can do, that doesn’t mean functions can do everything macros can do. Let’s look at andand for a moment. When you write:


foo().andand.bar(blitz())

Using the andand gem, Ruby treats this something like:


temp1 = foo()
temp2 = temp1.andand
temp3 = blitz()
temp2.bar(temp3)

As it happens, if you call nil.andand.bar(blitz()), it will return nil. But it will still evaluate blitz() before returning nil. What I would expect from something named andand is that if foo() is nil, Ruby will never evaluate blitz(). Something like:


temp1 = foo()
if temp1.nil?
    nil
else
    temp2 = blitz()
    temp1.bar(temp2)
end

What we want is that when we pass blitz() to andand, it is not evaluated unless the andand function uses it. The trouble is, you cannot write an andand method in Ruby that delivers these semantics.

Let’s hand wave over the difference between methods and functions for a moment and just look at calling functions. We’ll consider writing “our_and,” a function that emulates the short-circuit evaluation behaviour of Ruby’s “&&” and “and” operators. Ruby (and most other languages in use these days) uses call-by-value when it passes parameters to functions. In other words, when you write:


our_and(foo(), blitz())

Ruby turns that into something that looks like this:


var temp1 = foo()
var temp2 = blitz()
our_and(temp1, temp2)

It doesn’t matter if the function our_and uses blitz() internally or not, it is evaluated before our_and is called and its value is passed to our_and. Whereas our “if” statement in the previous example does not evaluate “blitz()” unless “foo().andand” is not nil.

Well, well, well. The inescapable conclusion is that there are some sequences of expressions in Ruby that cannot be represented as functions or methods. That’s right, functions and methods can’t do everything that Ruby code can do.

Macros and code rewriting can do an awful lot. The implementation of andand in the rewrite gem does rewrite code. When you write:


with(andand) do
    # ...
    foo().andand.bar(blitz())
    # ...
end

Rewrite rewrites your code in place to look something like:


# ...
lambda do |__121414053598468__|
    if __121414053598468__.nil?
        nil
    else
        __121414053598468__.bar(blitz())
    end
end.call(foo)
#...

And for that reason when you write foo().andand.bar(blitz()) using the rewrite gem instead of the andand gem, blitz() is not evaluated if foo() is nil. Big difference!! So it looks like one way to get around call-by-value is to rewrite your Ruby code. Excellent. Or is it?

What’s wrong with rewrite

Right now, the rewrite gem supports writing sexp processors. These are objects that encapsulate a way of transforming sexps. For example, here is the code that transforms expressions like “foo().andand.bar(blitz()):”


  def process_call(exp)
    exp.shift
    receiver_sexp = exp.first
    if matches_andand_invocation(receiver_sexp)
      exp.shift
      mono_parameter = Rewrite.gensym()
      s(:call, 
        s(:iter, 
          s(:fcall, :lambda), 
          s(:dasgn_curr, mono_parameter), 
          s(:if, 
            s(:call, s(:dvar, mono_parameter), :nil?), 
            s(:nil), 
            begin
              s(:call, 
                s(:dvar, mono_parameter), 
                *(exp.map { |inner| process_inner_expr inner })
              )
            ensure
              exp.clear
            end
          )
        ), 
        :call, 
        s(:array, 
          process_inner_expr(receiver_sexp[1])
        )
      )
    else
      begin
        s(:call,
          *(exp.map { |inner| process_inner_expr inner })
        )
      ensure
        exp.clear
      end
    end
  end

And that’s just a third of andand: There is another method that handles expressions like “foo().andand { |x| x.bar(blitz()) }” and a third that handles “foo().andand(&bar_proc).” Brutal.

Now, rewriting code has many other uses. One on my wish list is a rewriter that transforms expressions like: “foo.select { |x| … }.map { |y| … }.inject { |z| … }” into one big inject as an optimization. So I’m not ready to throw rewrite in the trash can just yet. But there’s no way I want to be writing all that out by hand every time I want to implement a function but work around call-by-value semantics.

What about macros?

Why can’t I write:


def_macro our_and(x,y)
    ((temp = x) ? (y) : (temp))
end

…And have it automatically expand my code such that when I write:


# ...
foo = our_and(bar(), blitz())
# ...

The macro expander rewrites it as:


# ...
foo = ((temp = bar() ? blitz() : temp)
# ...

Wouldn’t that work? Maybe. Then again, maybe not.

The problem given above—working around call-by-value—is just one small problem. A macro implementation would solve that problem, but there’s an awful lot of overhead required to make the implementation work, and whatever you do ends up being an incredibly leaky abstraction.

Take our example above. What happens if we have our own variable named temp? Does it get clobbered by expanding our_and? Or do we rename temp? Or do some automagic jigger-pokery with scopes?

Getting macros right is very tricky. I don’t personally plan to try my hand at implementing macros until I’m an expert on the subject of variable capture and can hold forth on the design trade-offs inherent in different schemes for implementing hygienic macros. But that’s just me.

Perhaps there are other ways to solve it without diving into a full-blown macro facility?

Lambdas and blocks

Indeed there are other ways. Ruby already has one-and-a-half of them: blocks and lambdas. Using blocks and lambdas, you can control evaluation precisely. The andand gem actually does support short-circuit semantics using a block. When you write:


nil.andand { |x| x.foo(blitz()) }

It does not evaluate blitz(). This alternate way of using andand supports the semantics we want by explicitly placing the code that should not be eagerly evaluated in a block. Given patience and a taste for squiggly braces, you can create non-standard evaluation without resorting to macros.

We said at the beginning that the reason we cannot use functions and methods to represent everything we can write in code is because Ruby uses call-by-value to pass parameters to functions. One way to work around that is this: instead of passing the value of each expression to a function, we can pass the expression itself, wrapped up in its own lambda.

Then, when the function needs the value, it can call the lambda. This technique has a name: it is called thunking.

We could implement our_and as follows:


our_and = lambda { |x,y|
    if temp = x.call
        y.call
    else
        temp
    end
end

Then when we call it, we could wrap our parameters in lambdas:


our_and.call(
    lambda { a() },
    lambda { b() }
)

Verify for yourself that this produces the behaviour we want, without the worry of our local variables messing things up for the calling code. Let’s go further: we can implement functions with a variable number of arguments using an enumeration of thunks. For example, we could write:


def try_these(*clauses)
    clauses.each { |clause| return clause.call rescue nil }
    nil
end

And call our function like this:


try_these(
    lambda { http_util.fetch(url, :login_as => :anonymous) },
    lambda { http_util.fetch(url, :login_as => ['user', 'password']) },
    lambda { default_value() }
)

We have just implemented the Try.these function from the Prototype Javascript library.

This technique gets us almost all of what we want for this common case of wanting to work around call-by-value semantics. As you can surmise from the fact that it has a name, it is not some newfangled shiny toy idea, it goes back to ALGOL 60, where it was known as call-by-name. (PHP has something called “Call By Name,” but it has a lot more in common with C++ references than it does with ALGOL parameter passing.)

The application of call-by-name as a substitute for full-blown macros isn’t novel either. Joel Klein pointed out that Call by need is a poor man’s macro. Another suggestion along similar lines is to rethink macros in Arc.

thunks: ugly name, ugly code

Our thunking approach solves a lot of our problems, but the implementation severely protrudes into the interface! We could argue that since our call-by-name functions have different behaviour than ordinary functions or methods, they ought to have different syntax.

That’s a reasonable point of view, and that’s exactly how languages like Smalltalk work: everything that involves delaying evaluation in some way uses blocks, even the if statements, which are methods that take blocks as arguments. So in Smalltalk, everything is consistent.

Ruby, OTOH, is not consistent. Operators like “&” and “|” are actually methods with call-by-value semantics, while operators like “&&” and “||” are special forms with call-by-value semantics. Likewise if you only need to delay one expression you can use a block, but if you need to delay two or more, you need at least one lambda. So another reasonable point of view is that we should follow Ruby’s philosophy of making the common case easy to use and not become reductionists trying to build everything out of five axiomatic forms.

So we have one approach—rewriting—that is crazy-hard to write but produces nicely readable code. And we have another approach—thunking—that is easy to write but produces unsightly boilerplate.

Maybe what we want is a rewriter, but we want an easier way to write rewriters for this simple case?

Called by name

Here’s how we could define and use a call-by-name function called “our_and”:


with (
    called_by_name(:our_and) { |x,y|
        if temp = x
            y
        else
            temp
        end
    }
) do
    # ...
    foo = our_and(bar(), blitz()) # method-like syntactic sugar
    # ...
end

What we just did is manufacture a rewriter without any sexps. Instead of getting rid of sexps, we’re treating them like assembler and using a declarative language to write the assembler for us. Our rewriter dutifully rewrites our code to look something like:


our_and = lambda { |x,y|
    if temp = x.call
        y.call
    else
        temp
    end
end
# ...
foo = our_and.call(
    lambda { bar() },
    lambda { blitz() }
)
# ...

We can define a rewriter for functions with splatted parameters too:


with(
    called_by_name(:try_these) { |*clauses|
        clauses.each { |clause| return clause rescue nil }
        nil
    }
) do
    # ...
    try_these(
        http_util.fetch(url, :login_as => :anonymous),
        http_util.fetch(url, :login_as => ['user', 'password']),
        default_value()
    )
    # ...
end

Becomes something like:


try_these = lambda { |*clauses|
    clauses.each { |clause| return clause.call rescue nil }
    nil
}
# ...
try_these.call(
    lambda { http_util.fetch(url, :login_as => :anonymous) },
    lambda { http_util.fetch(url, :login_as => ['user', 'password']) },
    lambda { default_value() }
)
# ...

It goes, boys!

—Lynn Hill after becoming the first person of either sex to climb The Nose of El Capitan, all free.

As of now, the rewrite gem supports called_by_name. You can write your own functions with call-by-name semantics using called_by_name just as you see here. As is standard with the rewrite gem, only the code in the do… end block is affected by your change.

call-by-name, in summary

To summarize, with the rewrite gem you can write functions that have call-by-name semantics without wrestling sexps into submission or encumbering your code with a lot of superfluous lambdas and calls:


with(
    called_by_name(:try_these) { |*clauses|
        clauses.each { |clause| return clause rescue nil }
        nil
    },
    called_by_name(:our_and) { |x,y|
        if temp = x
            y
        else
            temp
        end
    }
) do
    # ...
    try_these(
        http_util.fetch(url, :login_as => :anonymous),
        http_util.fetch(url, :login_as => ['user', 'password']),
        default_value()
    )
    # ...
    foo = our_and(bar(), blitz())
    # ...
end

This is a win when you don’t want your code encumbered with more lambdas than business logic. It may be a matter of taste, but part of what I like about Ruby having a special case for blocks is that they act as a huge hint that an expression is temporary: a block after #map suggests we are only using that expression in one place. Whereas when I see “Proc.new” or “lambda,” I expect that the expression will be passed around and used elsewhere.

Functions with call-by-name semantics communicate the same thing as blocks: the expressions are to be consumed by the function. When I see a lambda being passed to a function, I automatically expect it to be saved and possibly used elsewhere. For that reason, I prefer call-by-name semantics when an expression is not meant to be persisted beyond the function invocation.

Now, called_by_name is not a replacement for macros. There are lots of things macros can do that called_by_name cannot do (not to mention that there are lots of things code rewriting can do that macros cannot do). But just as Ruby’s blocks are a deliberate attempt to make a common case for anonymous functions easy to write, called_by_name makes a common case for macros easy to write and safe from variable capture problems.

Of course, called_by_name does so with lots of anonymous functions, and that is a much more expensive implementation than using a hygienic macro to rewrite code inline. But it feels like a move in an interesting direction: if it is a win to sometimes meta-program Ruby’s syntax with DSLs, it ought to also be a win to sometimes meta-program Ruby’s semantics with call-by-name functions.

afterword

So… Is this merely a way to replicate things that are already built into Ruby but do them fifty times slower?

I don’t know how to answer that question. When I heard Matz talk about Ruby at LL1, I didn’t catch the part of his speech where he described how to use metaprogramming to build a really neat web development framework. When you first see a new tool, you naturally start by applying it to problems you already know how to solve with your existing tools in the same way you have always solved such problems.

Only later, after this tool becomes perfectly natural to you, do you start to think of entirely new ways to use the tool. I’m not there yet, but my experience tells me that it’s always a win to have more freedom, to have fewer things you can’s do with a language.

If just one person—maybe it’s me, maybe it’s somebody else—leans forward one day and sees a new way of solving a problem with call-by-name semantics, I’ll consider working on this feature time well spent.

It probably won’t be something trivial like replicating short-circuit boolean operators. But it will be interesting, and I’m looking forward to finding out what it is.

¶ 9:25 PM

Comments on “Macros, Hygiene, and Call By Name in Ruby”:

This is great! Thank you!

I'm not much of a Ruby user but I know some Lisp. While I still prefer using defmacro (variable capture and all!) a function that automatically wraps all actual parameters in "thunks" would go a long way of providing possibilities for easier control abstraction. Ruby would be all the more powerful for it.

It might not be like Ruby's usual style of dynamic metaprogramming and looks (conceptually) like a more static approach, but it's not neccessarily an unwelcome direction.

# posted by

Mohamed Samy : 6:39 PM

I looks like you want lazy evaluation in Ruby, a la Haskell.

# posted by

Mark Cidade : 8:22 PM

looks like you want lazy evaluation in Ruby

been there, done that ;-)

Also, call-by-need is not equivalent to call-by-name in a language with side effects. You often want the optimization of call-by-need, but sometimes you want to write something that explicitly evaluates a parameter expression more than once.

# posted by

Reginald Braithwaite : 9:54 AM

I believe that andand() should be called nil_or(). As in:

return xxxx.nil_or yyyy;

But before your nice trick this was purely academic because the lack of progressiveness in the "or" clause made the whole construction useless.

That's no longer the case and I am impressed with what you achieved. Congratulations.

# posted by

JeanHuguesRobert : 7:12 AM

Apparently the D programming language directly supports call-by-name semantics with its lazy storage class:

http://www.digitalmars.com/d/2.0/lazy-evaluation.html

I don't think I've seen this kind of thing elsewhere. They mention "parallels" with Lisp macros there, although the feature only gives you the equivalent of "hygienic" macros that don't inspect the structure of their input sexprs and simply conditionally evaluate them.

# posted by

Yossi Kreinin : 8:45 AM

http://www.digitalmars.com/d/2.0/lazy-evaluation.html

Thanks for the link!

Yes, call-by-name s not the same thing as a rewrite or macro facility, but it is an interesting alternative in some of the places where a macro is a blunt instrument.

# posted by

Reginald Braithwaite : 10:46 AM

RLisp is Ruby and it easily supports macros like that. In simple cases like this variable hygiene is quite straightforward, you simply use (gensym) to generate fresh variable name. Variable hygiene is only a problem for much more complicated scenarios.

rlisp> (defmacro andand-send (obj meth . args) (let var (gensym)) `(do (let ,var ,obj) (if ,var (send ,var ',meth ,@args) ,var)))

#<Proc:0x00316884@STDIN:1>

rlisp> (andand-send nil - 8)

nil

rlisp> (andand-send 12 - 8)

4

rlisp> (macroexpand '(andand-send (+ 2 2) - 8))

(do (let #:G15 (+ 2 2)) (if #:G15 (send #:G15 (quote -) 8) #:G15))

# posted by

taw : 3:35 PM

<< Home