raganwald
(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Saturday, October 27, 2007
  String#to_proc


Breaking news! The irb enhancement gem Utility Belt includes String#to_proc

String#to_proc is an addition to Ruby’s core String class to enable point-free hylomorphisms

I’ll start again. String#to_proc adds a method to Ruby’s core String class to make lots of mapping and reducing operations more compact and easier to read by removing boilerplate and focusing on what is to be done. In many cases, the existing black syntax is just fine. But in a few cases, String#to_proc can make an expression even simpler.

String#to_proc is a port of the String Lambdas from Oliver Steele’s Functional Javascript library. I have modified the syntax to reflect how String#to_proc works in Ruby.

We’ll start with the examples from String Lambdas so you can see what is actually going on. Then we’ll look at how to use the & coercion to make working with arrays really simple.

to_proc creates a function from a string that contains a single expression. This function can then be applied to an argument list, either immediately:

'x+1'.to_proc[2];
→ 3
'x+2*y'.to_proc[2, 3];
→ 8
or (more usefully) later:

square = 'x*x'.to_proc;
square(3);
→ 9
square(4);
→ 16
Explicit parameters

If the string contains a ->, this separates the parameters from the body.

'x y -> x+2*y'.to_proc[2, 3];
→ 8
'y x -> x+2*y'.to_proc[2, 3];
→ 7
Otherwise, if the string contains a _, it’s a unary function and _ is name of the parameter:

'_+1'.to_proc[2];
→ 3
'_*_'.to_proc[3];
→ 9
Implicit parameters

If the string doesn’t specify explicit parameters, they are implicit.

If the string starts with an operator or relation besides -, or ends with an operator or relation, then its implicit arguments are placed at the beginning and/or end:

'*2'.to_proc[2];
→ 4
'/2'.to_proc[4];
→ 2
'2/'.to_proc[4];
→ 0.5
'/'.to_proc[2, 4];
→ 0.5
’.’ counts as a right operator:

'.abs'.to_proc[-1];
→ 1
Otherwise, the variables in the string, in order of occurrence, are its parameters.

'x+1'.to_proc[2];
→ 3
'x*x'.to_proc[3];
→ 9
'x + 2*y'.to_proc[1, 2];
→ 5
'y + 2*x'.to_proc[1, 2];
→ 5
Chaining

Chain -> to create curried functions.

'x y -> x+y'.to_proc[2, 3];
→ 5
'x -> y -> x+y'.to_proc[2][3];
→ 5
plus_two = 'x -> y -> x+y'.to_proc[2];
plus_two[3]
→ 5
Using String#to_proc in Idiomatic Ruby

Ruby on Rails popularized Symbol#to_proc, so much so that it will be part of Ruby 1.9.

If you like:

%w[dsf fgdg fg].map(&:capitalize)
→ ["Dsf", "Fgdg", "Fg"]
then %w[dsf fgdg fg].map(&'.capitalize') isn’t much of an improvement.

But what about doubling every value in a list:

(1..5).map &'*2'
→ [2, 4, 6, 8, 10]
Or folding a list:

(1..5).inject &'+'
→ 15
Or having fun with factorial:

factorial = "(1.._).inject &'*'".to_proc
factorial[5]
→ 120
String#to_proc, in combination with & coercing a value into a proc, lets you write compact maps, injections, selections, detections (and many others!) when you only need a simple expression.

Caveats: String#to_proc uses eval. Cue the chorus of people—pounding away on quad 3Ghz systems—complaining about the performance. You’re an adult. Decide for yourself whether this is an issue. After mankying things about to deduce the parameters, String#to_proc evaluates its expression in a different binding than where you wrote the String. This matters if you include free variables. My thinking is that it ceases to be a simple, easy-to-understand hack and becomes a cyrptic nightmare once you get too fancy.

You know that Voight-Kampff test of yours… did you ever take that test yourself?
—Rachael, Blade Runner


I have been using Functional Javascript for quite some time now, and I use the String Lambdas a lot. However, Ruby and Javascript are very different languages. Once you get out of the browser’s DOM, Javascript is a lot cleaner and more elegant than Ruby. For example, you don’t need to memorize the difference between a block, a lambda, and a proc. Javascript just has functions.

However, Javascript is more verbose: Whereas in Ruby you can write [1, 2, 3].map { |x| x*2 }, if Javascript had a map method for arrays, you would still have to write [1, 2, 3].map(function (x) { return x*2; }). So it’s a big win to make Javascript less verbose: code is easier to read at a glance when you don’t have to wade through jillions of function keywords.

Nevertheless, I still find myself itching for the String Lambdas when I’m writing Ruby code. It may be a matter of questionable taste, but for certain extremely simple expressions, I vastly prefer the point-free style. (-3..3).map &:abs is shorter than (-3..3).map { |x| x.abs }.

It is also cleaner to me. abs is a message, especially in a language like Ruby that supports the sending arbitrary messages named by symbols. Writing (-3..3).map &:abs looks very much like sending the abs message to everything in the list. I don’t need an x in there to tell me that.

Thus, I obviously like (-3..3).map &'.abs'. But I like (1..5).map &'*2' for the same reason. It isn’t just shorter, it hides a temporary variable that really doesn’t mean Jack to me when I’m reading the code. And quite honestly, (1..10).inject { |acc, mem| acc + mem } raises more questions than it answers about what inject does and how it does it. (1..10).inject &'+' gets right down to business for me. I’d prefer that it be called “fold,” but the raw, naked + seems to describe what I want done instead of how I want the computer to do it.

Symbol#to_proc also supports named parameters, either through implication (&'x+y') or with the arrow ('x y -> x*y'). I haven’t thought of a case where that would be a win over using a Ruby block: { |x, y| x*y }.

I’m divided about the underscore notation. It seems like a good compromise for expressions where there is a single parameter and it doesn’t fall on the left or the right side of an expression. Standardizing on an unusual variable name is, I think, a win. Underscore often means a “hole” in an expression or a computation, so it feels like a good fit. I would honestly much rather see something like: &'(1/_)+1' than &'(1/x)+1'. The underscore jumps out in an obvious way, and it wouldn’t be magically clearer to write { |x| (1/x)+1 }.

That being said, I haven’t actually written an underscore expression yet in actual code, so far I’m getting by using the point-free expressions to simplify things and using Ruby blocks for everything else.

RSpec

describe "String to Proc" do

before(:all) do
@one2five = 1..5
end

it "should handle simple arrow notation" do
@one2five.map(&'x -> x + 1').should eql(@one2five.map { |x| x + 1 })
@one2five.map(&'x -> x*x').should eql(@one2five.map { |x| x*x })
@one2five.inject(&'x y -> x*y').should eql(@one2five.inject { |x,y| x*y })
'x y -> x**y'.to_proc()[2,3].should eql(lambda { |x,y| x**y }[2,3])
'y x -> x**y'.to_proc()[2,3].should eql(lambda { |y,x| x**y }[2,3])
end

it "should handle chained arrows" do
'x -> y -> x**y'.to_proc()[2][3].should eql(lambda { |x| lambda { |y| x**y } }[2][3])
'x -> y z -> y**(z-x)'.to_proc()[1][2,3].should eql(lambda { |x| lambda { |y,z| y**(z-x) } }[1][2,3])
end

it "should handle the default parameter" do
@one2five.map(&'2**_/2').should eql(@one2five.map { |x| 2**x/2 })
@one2five.select(&'_%2==0').should eql(@one2five.select { |x| x%2==0 })
end

it "should handle point-free notation" do
@one2five.inject(&'*').should eql(@one2five.inject { |mem, var| mem * var })
@one2five.select(&'>2').should eql(@one2five.select { |x| x>2 })
@one2five.select(&'2<').should eql(@one2five.select { |x| 2<x })
@one2five.map(&'2*').should eql(@one2five.map { |x| 2*x })
(-3..3).map(&'.abs').should eql((-3..3).map { |x| x.abs })
end

it "should handle implied parameters as best it can" do
@one2five.inject(&'x*y').should eql(@one2five.inject(&'*'))
'x**y'.to_proc()[2,3].should eql(8)
'y**x'.to_proc()[2,3].should eql(8)
end

end
Go ahead, download the source code for yourself.

Update: Reg smacks himself in the head!

I had a look at the source code for Symbol#to_proc:

class Symbol
# Turns the symbol into a simple proc, which is especially useful for enumerations. Examples:
#
# # The same as people.collect { |p| p.name }
# people.collect(&:name)
#
# # The same as people.select { |p| p.manager? }.collect { |p| p.salary }
# people.select(&:manager?).collect(&:salary)
def to_proc
Proc.new { |*args| args.shift.__send__(self, *args) }
end
end
Look at that: Although the examples are all of unary messages like .name, the lambdas created handle methods with arguments. And since almost everything in Ruby is a method, including operators like +… You can use Symbol#to_proc to do some of the point-free stuff I like:

[1, 2, 3, 4, 5].inject(&:+)
→ 15
[{ :foo => 1 }, { :bar => 2 }, { :blitz => 3 }].inject &:merge
→ {:foo=>1, :bar=>2, :blitz=>3}

Labels: , ,

 

Comments on “String#to_proc:
Now write it for us in Java ;)

Very nice though. I'm always happy to trade performance for readability as long as the former isn't already scarce
 
Now write it for us in Java

You have to write for your audience. I would never write (1..5).map &'*2' in Java when I could write ListFactoryFactory.getListFactoryFromResource(new ResourceName('com.javax.magnitudes.integers').setLowerBound(1).setUpperBound(5).setStep(1).applyFunctor(new Functor () { public void eval (x) { return x * 2; } }))

I'm simplifying, of course, I've left out the security and logging wrappers.
 
This is really cool, but why do you use

"".respond_to? :to_proc

when you could use

public_method_defined? :to_proc

not necessarily a criticism - I don't know if pmd? is a Rails-ism, or how it is performance-wise - it just seems semantically cleaner to compare method existence rather than popping out of the abstraction stack to an instance method.

I might be splitting hairs.
 
also, I refactored it:

http://pastie.caboo.se/111620

but being an RSpec n00b I'm getting "0 examples 0 failures" on the spec for some reason.
 
I submitted it to refactormycode.com, which will make it a whole lot easier for anyone to suggest modifications :-) You just need an OpenID.
See http://refactormycode.com/codes/114-string-to_proc-by-reginald-braithwaite
 
Stripping out "arguments" (in the long regex) before determining the list of parameter names looks like an artifact from Functional Javascript that's not necessary in ruby.
 
to be honest... this actually looks complicated :(
 
two more things: I actually disagree about the use of underscore, because underscore is already a special variable in IRB, and because it looks Perlish. and if you want inject to be called fold, just alias fold inject. anyway, I have this in my .irbrc, but I skipped the underscore part, so I'm just going to test what happens.

oh, ok, it works, because of the string interpolation, but it still makes me antsy. might have to change it later on.
 
I actually disagree about the use of underscore, because underscore is already a special variable in IRB

Well, to each their own. Decide for yourself whether you like the Perlisms in Ruby or the Smalltalkisms or the Lispisms.
 
Stripping out "arguments" (in the long regex) before determining the list of parameter names looks like an artifact from Functional Javascript that's not necessary in ruby.

True, and I'm experimenting with removing the entire first regex.
 
it just seems semantically cleaner to compare method existence rather than popping out of the abstraction stack to an instance method.

Whatever floats your boat, help yourself.
 
You seem to have made a minor mistake in your example:

----
Otherwise, if the string contains a _, it’s a unary function and _ is name of the parameter:


'_ -> _+1'.to_proc[2];
→ 3
'_ -> _*_'.to_proc[3];
→ 9
----

These should be:
'_+1'.to_proc[2];
'_*_'.to_proc[3];
...in order to match the sentence :)
 
Kudos! Really cool! I'm adding it to my Ruby kit. Thanks a lot, man :)
 
Mmmmmm .... Ruby Injection.
 
"After manking things"? I'm guessing typo...

Great article (as usual), though.
 
manking => should have been mankying or mankeying.

Thanks!
 
(1..5).map &'*2'
→ [2, 4, 6, 8, 10]

vs.

(1..5).map {|x| 2*x}
→ [2, 4, 6, 8, 10]

C'mon. The second is clearer and cleaner.

You're solving a problem that's just not there. If this solves a real problem, post some code that makes that point.
 
Dear anonymnous:

(1..5).map {|x| 2*x}
→ [2, 4, 6, 8, 10]

This says: "Map a function over a collection. The function sends the message '* argument' to the number two."

(1..5).map &'*2'
→ [2, 4, 6, 8, 10]

This says: "Map the message '*2' over a collection."

Or if you prefer functions, it maps "the function *2" over a collection. One function, conceptually.

That is a very different thing than lambda { |x| x*2 }. Forget the keyword, it isn't required when you're using a block. But { |x| x*2 } is NOT the function *2. It's a function that takes an argument and THEN invokes *2 on it. There are two functions, *2 and a wrapper around it.

This is much more strongly demonstrated with the reductions like (1..5).inject &'*'. This discards all parameters and wrappers and gets right down to business: Apply the raw operator * to the collection. Again, (1..5).inject { |acc, mem| acc*mem } has two functions: the wrapper and the *. Don't you want to just talk about the *?

If you called me on the telephone and described the algorithm, wouldn't you say "The product of the numbers from one to five?" Would you really say "apply a function that takes two arguments and returns the product of the arguments?" Product already is a function that returns the product of its arguments!

Is that a problem? Not to you, it isn't. So there is no need for you to use this approach.

But for my sense of aesthetic, the illusion of discarding the wrapper is cleaner and more direct.
 
Nice trick! However as always the idea was discussed before :-)

http://blog.codahale.com/2006/08/01/stupid-ruby-tricks-stringto_proc/

(Less powerful and more simple implementation)
 
Because my diseased brain wants to store Procs in a database:
http://pastie.caboo.se/112718

Now to tie these two ideas together... like peanut butter and jelly in the same jar.
 
The only reason why I'm hesitant to use this is that the correct binding context is only set when a block is passed.

The only way I've found to get the caller's binding is through a continuation hack that obviously won't work in JRuby or Ruby 1.9. Matz has hinted that this will be easy in the future.

Until then, this is just a tad bit leaky for me. I'm sold on the idea and I know it already works in most of the situations I would use it in. However, when I use it I want to be able to apply it with reckless abandon (within the limits of good taste and readability, of course).
 
Sofal:

I thought about that for a while, and I included a hack so that you can pass the current binding to the to_proc method if you really want.

However, I didn't include support for Binding.of_caller. I considered making it work if you have Binding.of_caller defined, but after some thought, I dropped it.

This is a bit Python-ish, but it forces you to use String lambdas when you have something really short and simple, and to use blocks or lambdas when you need something more complex.

I would like to be able to write stuff like foo.map &'+ bar', and perhaps Ruby 1.9 will make that possible one day.

But for now, this does work for simple cases.
 
This is fantastic. I've missed the simplicity of HOF present in other languages but not in Ruby and this makes it mostly better.
 




<< Home
Reg Braithwaite


Recent Writing
Homoiconic Technical Writing / raganwald.posterous.com

Books
What I‘ve Learned From Failure / Kestrels, Quirky Birds, and Hopeless Egocentricity

Share
rewrite_rails / andand / unfold.rb / string_to_proc.rb / dsl_and_let.rb / comprehension.rb / lazy_lists.rb

Beauty
IS-STRICTLY-EQUIVALENT-TO-A / Spaghetti-Western Coding / Golf is a good program spoiled / Programming conventions as signals / Not all functions should be object methods

The Not So Big Software Design / Writing programs for people to read / Why Why Functional Programming Matters Matters / But Y would I want to do a thing like this?

Work
The single most important thing you must do to improve your programming career / The Naïve Approach to Hiring People / No Disrespect / Take control of your interview / Three tips for getting a job through a recruiter / My favourite interview question

Management
Exception Handling in Software Development / What if powerful languages and idioms only work for small teams? / Bricks / Which theory fits the evidence? / Still failing, still learning / What I’ve learned from failure

Notation
The unary ampersand in Ruby / (1..100).inject(&:+) / The challenge of teaching yourself a programming language / The significance of the meta-circular interpreter / Block-Structured Javascript / Haskell, Ruby and Infinity / Closures and Higher-Order Functions

Opinion
Why Apple is more expensive than Amazon / Why we are the biggest obstacles to our own growth / Is software the documentation of business process mistakes? / We have lost control of the apparatus / What I’ve Learned From Sales I, II, III

Whimsey
The Narcissism of Small Code Differences / Billy Martin’s Technique for Managing his Manager / Three stories about The Tao / Programming Language Stories / Why You Need a Degree to Work For BigCo

History
06/04 / 07/04 / 08/04 / 09/04 / 10/04 / 11/04 / 12/04 / 01/05 / 02/05 / 03/05 / 04/05 / 06/05 / 07/05 / 08/05 / 09/05 / 10/05 / 11/05 / 01/06 / 02/06 / 03/06 / 04/06 / 05/06 / 06/06 / 07/06 / 08/06 / 09/06 / 10/06 / 11/06 / 12/06 / 01/07 / 02/07 / 03/07 / 04/07 / 05/07 / 06/07 / 07/07 / 08/07 / 09/07 / 10/07 / 11/07 / 12/07 / 01/08 / 02/08 / 03/08 / 04/08 / 05/08 / 06/08 / 07/08 /