String#to

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Saturday, October 27, 2007

Breaking news! The irb enhancement gem Utility Belt includes String#to_proc

String#to_proc is an addition to Ruby’s core String class to enable point-free hylomorphisms…

I’ll start again. String#to_proc adds a method to Ruby’s core String class to make lots of mapping and reducing operations more compact and easier to read by removing boilerplate and focusing on what is to be done. In many cases, the existing black syntax is just fine. But in a few cases, String#to_proc can make an expression even simpler.

String#to_proc is a port of the String Lambdas from Oliver Steele’s Functional Javascript library. I have modified the syntax to reflect how String#to_proc works in Ruby.

We’ll start with the examples from String Lambdas so you can see what is actually going on. Then we’ll look at how to use the & coercion to make working with arrays really simple.

to_proc creates a function from a string that contains a single expression. This function can then be applied to an argument list, either immediately:


'x+1'.to_proc[2];
     → 3
'x+2*y'.to_proc[2, 3];
     → 8

or (more usefully) later:


square = 'x*x'.to_proc;
square(3);
     → 9
square(4);
     → 16

Explicit parameters

If the string contains a ->, this separates the parameters from the body.


'x y -> x+2*y'.to_proc[2, 3];
     → 8
'y x -> x+2*y'.to_proc[2, 3];
     → 7

Otherwise, if the string contains a _, it’s a unary function and _ is name of the parameter:


'_+1'.to_proc[2];
     → 3
'_*_'.to_proc[3];
     → 9

Implicit parameters

If the string doesn’t specify explicit parameters, they are implicit.

If the string starts with an operator or relation besides -, or ends with an operator or relation, then its implicit arguments are placed at the beginning and/or end:


'*2'.to_proc[2];
     → 4
'/2'.to_proc[4];
     → 2
'2/'.to_proc[4];
     → 0.5
'/'.to_proc[2, 4];
     → 0.5

’.’ counts as a right operator:


'.abs'.to_proc[-1];
 → 1

Otherwise, the variables in the string, in order of occurrence, are its parameters.


'x+1'.to_proc[2];
     → 3
'x*x'.to_proc[3];
     → 9
'x + 2*y'.to_proc[1, 2];
     → 5
'y + 2*x'.to_proc[1, 2];
     → 5

Chaining

Chain -> to create curried functions.


'x y -> x+y'.to_proc[2, 3];
     → 5
'x -> y -> x+y'.to_proc[2][3];
     → 5
plus_two = 'x -> y -> x+y'.to_proc[2];
plus_two[3]
     → 5

Using String#to_proc in Idiomatic Ruby

Ruby on Rails popularized Symbol#to_proc, so much so that it will be part of Ruby 1.9.

If you like:


%w[dsf fgdg fg].map(&:capitalize)
    → ["Dsf", "Fgdg", "Fg"]

then %w[dsf fgdg fg].map(&'.capitalize') isn’t much of an improvement.

But what about doubling every value in a list:


(1..5).map &'*2'
    → [2, 4, 6, 8, 10]

Or folding a list:


(1..5).inject &'+'
    → 15

Or having fun with factorial:


factorial = "(1.._).inject &'*'".to_proc
factorial[5]
    → 120

String#to_proc, in combination with & coercing a value into a proc, lets you write compact maps, injections, selections, detections (and many others!) when you only need a simple expression.

Caveats: String#to_proc uses eval. Cue the chorus of people—pounding away on quad 3Ghz systems—complaining about the performance. You’re an adult. Decide for yourself whether this is an issue. After mankying things about to deduce the parameters, String#to_proc evaluates its expression in a different binding than where you wrote the String. This matters if you include free variables. My thinking is that it ceases to be a simple, easy-to-understand hack and becomes a cyrptic nightmare once you get too fancy.

You know that Voight-Kampff test of yours… did you ever take that test yourself?

—Rachael, Blade Runner

I have been using Functional Javascript for quite some time now, and I use the String Lambdas a lot. However, Ruby and Javascript are very different languages. Once you get out of the browser’s DOM, Javascript is a lot cleaner and more elegant than Ruby. For example, you don’t need to memorize the difference between a block, a lambda, and a proc. Javascript just has functions.

However, Javascript is more verbose: Whereas in Ruby you can write [1, 2, 3].map { |x| x*2 }, if Javascript had a map method for arrays, you would still have to write [1, 2, 3].map(function (x) { return x*2; }). So it’s a big win to make Javascript less verbose: code is easier to read at a glance when you don’t have to wade through jillions of function keywords.

Nevertheless, I still find myself itching for the String Lambdas when I’m writing Ruby code. It may be a matter of questionable taste, but for certain extremely simple expressions, I vastly prefer the point-free style. (-3..3).map &:abs is shorter than (-3..3).map { |x| x.abs }.

It is also cleaner to me. abs is a message, especially in a language like Ruby that supports the sending arbitrary messages named by symbols. Writing (-3..3).map &:abs looks very much like sending the abs message to everything in the list. I don’t need an x in there to tell me that.

Thus, I obviously like (-3..3).map &'.abs'. But I like (1..5).map &'*2' for the same reason. It isn’t just shorter, it hides a temporary variable that really doesn’t mean Jack to me when I’m reading the code. And quite honestly, (1..10).inject { |acc, mem| acc + mem } raises more questions than it answers about what inject does and how it does it. (1..10).inject &'+' gets right down to business for me. I’d prefer that it be called “fold,” but the raw, naked + seems to describe what I want done instead of how I want the computer to do it.

Symbol#to_proc also supports named parameters, either through implication (&'x+y') or with the arrow ('x y -> x*y'). I haven’t thought of a case where that would be a win over using a Ruby block: { |x, y| x*y }.

I’m divided about the underscore notation. It seems like a good compromise for expressions where there is a single parameter and it doesn’t fall on the left or the right side of an expression. Standardizing on an unusual variable name is, I think, a win. Underscore often means a “hole” in an expression or a computation, so it feels like a good fit. I would honestly much rather see something like: &'(1/_)+1' than &'(1/x)+1'. The underscore jumps out in an obvious way, and it wouldn’t be magically clearer to write { |x| (1/x)+1 }.

That being said, I haven’t actually written an underscore expression yet in actual code, so far I’m getting by using the point-free expressions to simplify things and using Ruby blocks for everything else.

RSpec


describe "String to Proc" do

  before(:all) do
    @one2five = 1..5
  end

  it "should handle simple arrow notation" do
    @one2five.map(&'x -> x + 1').should eql(@one2five.map { |x| x + 1 })
    @one2five.map(&'x -> x*x').should eql(@one2five.map { |x| x*x })
    @one2five.inject(&'x y -> x*y').should eql(@one2five.inject { |x,y| x*y })
    'x y -> x**y'.to_proc()[2,3].should eql(lambda { |x,y| x**y }[2,3])
    'y x -> x**y'.to_proc()[2,3].should eql(lambda { |y,x| x**y }[2,3])
  end

  it "should handle chained arrows" do
    'x -> y -> x**y'.to_proc()[2][3].should eql(lambda { |x| lambda { |y| x**y } }[2][3])
    'x -> y z -> y**(z-x)'.to_proc()[1][2,3].should eql(lambda { |x| lambda { |y,z| y**(z-x) } }[1][2,3])
  end

  it "should handle the default parameter" do
    @one2five.map(&'2**_/2').should eql(@one2five.map { |x| 2**x/2 })
    @one2five.select(&'_%2==0').should eql(@one2five.select { |x| x%2==0 })
  end

  it "should handle point-free notation" do
    @one2five.inject(&'*').should eql(@one2five.inject { |mem, var| mem * var })
    @one2five.select(&'>2').should eql(@one2five.select { |x| x>2 })
    @one2five.select(&'2<').should eql(@one2five.select { |x| 2<x })
    @one2five.map(&'2*').should eql(@one2five.map { |x| 2*x })
    (-3..3).map(&'.abs').should eql((-3..3).map { |x| x.abs })
  end

  it "should handle implied parameters as best it can" do
    @one2five.inject(&'x*y').should eql(@one2five.inject(&'*'))
    'x**y'.to_proc()[2,3].should eql(8)
    'y**x'.to_proc()[2,3].should eql(8)
  end

end

Go ahead, download the source code for yourself.

Update: Reg smacks himself in the head!

I had a look at the source code for Symbol#to_proc:


class Symbol
  # Turns the symbol into a simple proc, which is especially useful for enumerations. Examples:
  #
  #   # The same as people.collect { |p| p.name }
  #   people.collect(&:name)
  #
  #   # The same as people.select { |p| p.manager? }.collect { |p| p.salary }
  #   people.select(&:manager?).collect(&:salary)
  def to_proc
    Proc.new { |*args| args.shift.__send__(self, *args) }
  end
end

Look at that: Although the examples are all of unary messages like .name, the lambdas created handle methods with arguments. And since almost everything in Ruby is a method, including operators like +… You can use Symbol#to_proc to do some of the point-free stuff I like:


[1, 2, 3, 4, 5].inject(&:+)
     → 15
[{ :foo => 1 }, { :bar => 2 }, { :blitz => 3 }].inject &:merge
     → {:foo=>1, :bar=>2, :blitz=>3}

Labels: lispy, popular, ruby

¶ 11:22 AM

Comments on “String#to_proc”:

Now write it for us in Java ;)

Very nice though. I'm always happy to trade performance for readability as long as the former isn't already scarce

# posted by

crayz : 4:00 PM

Now write it for us in Java

You have to write for your audience. I would never write (1..5).map &'*2' in Java when I could write ListFactoryFactory.getListFactoryFromResource(new ResourceName('com.javax.magnitudes.integers').setLowerBound(1).setUpperBound(5).setStep(1).applyFunctor(new Functor () { public void eval (x) { return x * 2; } }))

I'm simplifying, of course, I've left out the security and logging wrappers.

# posted by

Reginald Braithwaite : 7:26 PM

This is really cool, but why do you use

"".respond_to? :to_proc

when you could use

public_method_defined? :to_proc

not necessarily a criticism - I don't know if pmd? is a Rails-ism, or how it is performance-wise - it just seems semantically cleaner to compare method existence rather than popping out of the abstraction stack to an instance method.

I might be splitting hairs.

# posted by

Giles Bowkett : 10:27 PM

also, I refactored it:

http://pastie.caboo.se/111620

but being an RSpec n00b I'm getting "0 examples 0 failures" on the spec for some reason.

# posted by

Giles Bowkett : 10:41 PM

I submitted it to refactormycode.com, which will make it a whole lot easier for anyone to suggest modifications :-) You just need an OpenID.
See http://refactormycode.com/codes/114-string-to_proc-by-reginald-braithwaite

# posted by

Mathieu Martin : 2:30 AM

Stripping out "arguments" (in the long regex) before determining the list of parameter names looks like an artifact from Functional Javascript that's not necessary in ruby.

# posted by

brett : 2:39 AM

to be honest... this actually looks complicated :(

# posted by

Anonymous : 9:50 AM

two more things: I actually disagree about the use of underscore, because underscore is already a special variable in IRB, and because it looks Perlish. and if you want inject to be called fold, just alias fold inject. anyway, I have this in my .irbrc, but I skipped the underscore part, so I'm just going to test what happens.

oh, ok, it works, because of the string interpolation, but it still makes me antsy. might have to change it later on.

# posted by

Giles Bowkett : 11:24 AM

I actually disagree about the use of underscore, because underscore is already a special variable in IRB

Well, to each their own. Decide for yourself whether you like the Perlisms in Ruby or the Smalltalkisms or the Lispisms.

# posted by

Reginald Braithwaite : 2:07 PM

Stripping out "arguments" (in the long regex) before determining the list of parameter names looks like an artifact from Functional Javascript that's not necessary in ruby.

True, and I'm experimenting with removing the entire first regex.

# posted by

Reginald Braithwaite : 2:08 PM

it just seems semantically cleaner to compare method existence rather than popping out of the abstraction stack to an instance method.

Whatever floats your boat, help yourself.

# posted by

Reginald Braithwaite : 2:13 PM

You seem to have made a minor mistake in your example:

----
Otherwise, if the string contains a _, it’s a unary function and _ is name of the parameter:

'_ -> _+1'.to_proc[2];
→ 3
'_ -> _*_'.to_proc[3];
→ 9
----

These should be:
'_+1'.to_proc[2];
'_*_'.to_proc[3];
...in order to match the sentence :)

# posted by

Porges : 5:07 PM

Kudos! Really cool! I'm adding it to my Ruby kit. Thanks a lot, man :)

# posted by

Aquarius : 5:44 PM

Mmmmmm .... Ruby Injection.

# posted by

Anonymous : 12:49 AM

"After manking things"? I'm guessing typo...

Great article (as usual), though.

# posted by

Adam C. : 8:11 AM

manking => should have been mankying or mankeying.

Thanks!

# posted by

Reginald Braithwaite : 10:16 AM

(1..5).map &'*2'
→ [2, 4, 6, 8, 10]

vs.

(1..5).map {|x| 2*x}
→ [2, 4, 6, 8, 10]

C'mon. The second is clearer and cleaner.

You're solving a problem that's just not there. If this solves a real problem, post some code that makes that point.

# posted by

Anonymous : 12:12 PM

Dear anonymnous:

(1..5).map {|x| 2*x}
→ [2, 4, 6, 8, 10]

This says: "Map a function over a collection. The function sends the message '* argument' to the number two."

(1..5).map &'*2'
→ [2, 4, 6, 8, 10]

This says: "Map the message '*2' over a collection."

Or if you prefer functions, it maps "the function *2" over a collection. One function, conceptually.

That is a very different thing than lambda { |x| x*2 }. Forget the keyword, it isn't required when you're using a block. But { |x| x*2 } is NOT the function *2. It's a function that takes an argument and THEN invokes *2 on it. There are two functions, *2 and a wrapper around it.

This is much more strongly demonstrated with the reductions like (1..5).inject &'*'. This discards all parameters and wrappers and gets right down to business: Apply the raw operator * to the collection. Again, (1..5).inject { |acc, mem| acc*mem } has two functions: the wrapper and the *. Don't you want to just talk about the *?

If you called me on the telephone and described the algorithm, wouldn't you say "The product of the numbers from one to five?" Would you really say "apply a function that takes two arguments and returns the product of the arguments?" Product already is a function that returns the product of its arguments!

Is that a problem? Not to you, it isn't. So there is no need for you to use this approach.

But for my sense of aesthetic, the illusion of discarding the wrapper is cleaner and more direct.

# posted by

Reginald Braithwaite : 12:26 PM

Nice trick! However as always the idea was discussed before :-)

http://blog.codahale.com/2006/08/01/stupid-ruby-tricks-stringto_proc/

(Less powerful and more simple implementation)

# posted by

martink : 2:18 PM

Because my diseased brain wants to store Procs in a database:
http://pastie.caboo.se/112718

Now to tie these two ideas together... like peanut butter and jelly in the same jar.

# posted by

andrew : 10:13 AM

The only reason why I'm hesitant to use this is that the correct binding context is only set when a block is passed.

The only way I've found to get the caller's binding is through a continuation hack that obviously won't work in JRuby or Ruby 1.9. Matz has hinted that this will be easy in the future.

Until then, this is just a tad bit leaky for me. I'm sold on the idea and I know it already works in most of the situations I would use it in. However, when I use it I want to be able to apply it with reckless abandon (within the limits of good taste and readability, of course).

# posted by

Sofal : 2:07 PM

Sofal:

I thought about that for a while, and I included a hack so that you can pass the current binding to the to_proc method if you really want.

However, I didn't include support for Binding.of_caller. I considered making it work if you have Binding.of_caller defined, but after some thought, I dropped it.

This is a bit Python-ish, but it forces you to use String lambdas when you have something really short and simple, and to use blocks or lambdas when you need something more complex.

I would like to be able to write stuff like foo.map &'+ bar', and perhaps Ruby 1.9 will make that possible one day.

But for now, this does work for simple cases.

# posted by

Reginald Braithwaite : 2:15 PM

This is fantastic. I've missed the simplicity of HOF present in other languages but not in Ruby and this makes it mostly better.

# posted by

Code Monkey : 4:21 PM

<< Home