Interview questions I have never been asked, Episode I

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Tuesday, June 24, 2008

Compare and contrast:


some_array.any? { |n| n > 100 }

And:


!!some_array.detect { |n| n > 100 }

Which do you think is easier to read?

Are classes and modules easier to read and understand when they have lots and lots of specific methods like #any? or are they easier to read and understand with a small number of axiomatic methods that can be composed and combined like #fold and #unfold? When you design a module or class, do you write lots of convenience methods in advance? Or do you refactor code by writing convenience methods when you find yourself repeating the same code?

The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to functional programming techniques.

If you do refactor code to eliminate duplication, is there an amount of duplication that is too small to matter, like "!!"? Or is there an underlying principle of documenting intent that you wish to make explicit?

Is:


do_something() if some_array.detect { |n| n > 100 }

Misleading because it doesn’t actually use the element it detects? Or is it a reasonable idiom to test for an element’s existence without using a specific method like #any or #nil?

Are applications easier to read and understand when they make use of lots and lots of specific methods like #any or are they easier to read and understand when they compose and combine a smaller number of axiomatic methods so that you aren’t constantly looking things up?

Do you think applications should have large or small vocabularies?

This example can be easily translated to the language du jour, the underlying principle applies to programming in general

¶ 10:45 AM

Comments on “Interview questions I have never been asked, Episode I”:

A larger vocabulary is clearly better for communicating the programmers intent. If that weren't the case, then you'd just use a for loop and the list size to do everything keeping the smallest possible interface on the list. Using detect without using the result is deceptive, you're lying to the reader and trying to confuse him.

# posted by

Ramon Leon : 11:19 AM

I'll argue that Ruby does make a difference here; there's a semantic difference between detect and what I expect out of any. It can be defined in a very brief irb session:

>> x = [nil, false, true]
=> [nil, false, true]
>> x.detect {|y| y == nil}
=> nil
>> !!x.detect {|y| y == nil}
=> false
>> x.any {|y| y == nil}
=> true

# posted by

Bill Mill : 11:48 AM

Again though, it could just be python influencing my intuitions:

>>> x = [True, False, None]
>>> any(y == None for y in x)
True

# posted by

Bill Mill : 11:55 AM

Reading over my comments, I should clarify why this matters to me, because of course you are using detect to look for integers, where of course it will function perfectly well. (It wouldn't in python because zero is falsish).

My problem with doing so is that it's not thinking axiomatically. In my head, I would look for a function that always returned a truish object upon finding it, and detect wouldn't fit the bill (as shown above).

Thus, my argument is that you have not presented an example of a situation where a convenience method may be useful instead of a manipulation of an axiom, but instead a place where there is clear necessity for a further axiom.

(Side note #1: What would be your favorite modification of the "detect" idiom to be correct for all values of x? I'd use x.index I think?)

(Side note #2: I forgot to mention the very clean python idiom for checking for an element's existance in an array:

>> x = [True, False, None]
>> None in x
True
)

# posted by

Bill Mill : 12:08 PM

I agree that since you're not using the result, the version with detect does not as clearly show your intent.

While not meaning to stir up arguments about premature optimization, efficiency is a concern as well. Because of your usage of some exotic structure(e.g. Bloom filter, "ropes" instead of strings, etc.) perhaps a method like any can take advantage of something "clever" that would be impossible to take advantage of from a more "axiomatic" interface.

# posted by

titivillus : 12:22 PM

The language that makes up the technical base is the foundation for expressing your specific programming solution. It is best to completely encapsulate that technical level with a higher problem-domain-specific one.

My reasoning is that you have to create a higher level vocabulary anyways, so the most readable solution doesn't flip flop between the two (doubling up in size) it sits complete at an upper level.

There is nothing wrong with framing the problem with your own primitives, but within that context it must be consistent and used consistently to be easily readable.

I'm working on a new post to dig a bit deeper into readability. Hopefully it will be finished in a day or two.

Paul.
http://theprogrammersparadox.blogspot.com

# posted by

Paul W. Homer : 1:18 PM

Personally, I think that any language feature which encourages the use of the bangbang operator (!!) is a misfeature. English classes discourage double negatives for a reason.

# posted by

Avdi Grimm : 2:39 PM

It's 'any?' not 'any'

That to me makes a big difference, because the ? signals a boolean test.

I love how ruby does that with blah? and modify! methods, nothing would make me happier than for them to enforce it, the same way C++ enforces const member functions.

Anyway, I'm firmly on the side of large vocabularies. They let you write your code more as a sentence or statement for a human to read, rather than a list of instructions for a computer to execute.

With small languages everyone ends up reinventing the wheel, and writing a zillion helper functions to do things like .any?, so apart from the disk space required to ship a larger standard library, there's no downside to having a bigger vocab IMHO

# posted by

Orion Edwards : 4:50 PM

AFAICT, the rule in Ruby is that "?" does signal apredicate method (like foo-p in CL), but "!" is only used to disambiguate destructive vs. the non-destructive versions of the same method (like reverse and reverse! or compact and compact!)

# posted by

Reginald Braithwaite : 5:10 PM

and here I was typing:

some_array.inject(false){|bool, el| bool = bool ? el.test : bool}

at least today I know I learned something.

# posted by

Nathan : 5:36 PM

I think that problem spaces have a certain vocabulary size and that source code solving problems in that space should scale relative to that.

There may be ways to trivially disprove this assertion (i.e. a problem with a small vocabulary that requires an enormous vocabulary to solve, like an NP-complete problem solved in P).

# posted by

John : 6:11 PM

I'm mostly with bill mill. detect was a bad choice.

But you do mention fold, which i think is the same as reduce/inject.

I was going to mention the code nathan posted as the alternative using inject.

but now i'm going to go on tangent.

I often transform an array into a hash by some property of the array. For example I might want to group a list of people by lastname. I generally do:

people.inject(Hash.new{[]}) { |h, p| h[p.lastname] << p; h }

or

people.inject({}) { |h, p| (h[p.lastname] ||= []) << p; h }

Once I abstracted that away and created something like

array.hash_by { |n| n.foo }

but i never remember to use it.

I think using inject here shows the intent more clearly, because everybody knows how inject works, but no one knows hash_by, probably not even me, a few weeks in the future.

What I guess I'm trying to get at is: you can build a great language with just map/filter/reduce (collect/select/inject, etc). But part of that great language will be the helper methods you write with these building blocks. The same code you started with could be implemented in many ways with them:

list.map { |n| n > 100 }.inject { |a, b| a && b }
list.select { |n| n > 100 }.length != 0

and the aforementioned reduction by nathan.

And yet, I'm all for #any? as a language feature. Not at all because it's performance-wise better than inject because it can exit early (and detect/find do that too), but because it's pretty damn obivous.

In the end I just think your example shot itself in the foot because detect is really bad there, and any is really great. Instead, how do you feel about my hash_by?

# posted by

Caio : 9:29 PM

Of course, the any? version is clearer, it's also less likely to hide a bug. But even if it weren't, !! brings me out in hives. It's pure noise - there's absolutely no reason to go coercing any result into a boolean in such a simple minded fashion because everything that deals with truthy values already knows about what is truthy.

Seeing !! in code tells me is that the author hasn't got over static typing yet.

# posted by

bofh.org.uk : 9:27 AM

user.has_photo?
user.photos.count > 0

What do you think about this more subtle difference?
Do you think both can live together in a single codebase?
If so, how do we make programmers use the right one?

# posted by

Alisey : 6:08 AM

<< Home