Programming conventions as signals

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Monday, November 12, 2007

Some time ago, I spent a great deal of my waking and sleeping hours thinking about Contract Bridge. I especially thought about bidding systems. Bidding in bridge is a very hard problem: you are trying to coöperatively seek a maximal payoff contract using an incredibly limited vocabulary that is a scarce resource: the pool of legal bids shrinks with each bid. And that’s without considering the fact that two opponents are competing for the same resource to try to seek your minimal payoff and frustrate your attempts to communicate.

Anyhow.

Given the limited information available, it is critical in bridge to use every possible bidding sequence productively. Therefore, if there are two ways to say the same thing, bidding systems are designed such that there is a clear understanding that one of the two ways means something subtly different than the other. Although each bid means exactly the same literal thing—Five Diamonds always means eleven tricks with Diamonds as trumps—players can draw elaborate inferences of what a player bidding Five Diamonds holds based on the sequence of bids leading up to that moment.

If you and your partner play that a 1NT opening shows a balanced hand with 15-17 high card points, what do you make of it when your partner opens One Diamond and rebids No Trumps later? A balanced hand, of course, but you know she doesn’t have 15-17 high card points, because she would have opened 1NT if she did. And if you play five card majors, you know she doesn’t have five or more Hearts or Spades.

So for programming languages, I believe the same thing. When there is more than one way to do it, don’t randomly choose which way to do it based on the phase of the moon. Don’t straightjacket yourselves by appointing some martinet to decide which way to do it on each project. Instead, use the different idioms as signals, as ways to provide additional information to programmers.

Signaling with syntax

Consider blocks in Ruby. Some people use do … end when the block needs multiple lines and { … } when it fits on one line. A popular (AFAIK) and superior idea is to use do..end and {…} to disambiguate between blocks that are executed chiefly for their side effects and blocks that are executed for their return values:


foo.map { |x|
    # I care about the result
}

foo.each do |x|
    # I care about the side effects
end

The computer doesn’t care, of course, but it signals an extra piece of information, the fact that you are chiefly interested in side effects in one case and chiefly interested in the result in another case.

In Java, the final keyword is a gold mine of signals, especially for variable declarations. If you mandate that every parameter and every local should be declared final unless you plan to modify it, this is like having a bidding system where a 1NT opening means 15-17 points and a balanced hand. The moment you see a variable that isn’t final, you know that the code is expected to mutate it, just as bridge players know that a rebid of 1NT means that their partner doesn’t have 15-17 points and a balanced hand.

TIMTOWTDI

If you look at any language where there is more than one way to do it, you will trip over opportunities like this to use the code itself to communicate. In Ruby, if you want to sort an array of something, there are two ways to indicate the sort order. First, you can define the Boat Operator (<=>) for the values being sorted. Second, you can provide a block to the :sort method telling it how to order the values.

Which is better?

My preference is that when a value has a natural, default sort order, it’s best to define the Boat Operator for it (and brave all of the challenges inherent in using methods for object comparisons). When you see a sort without a block, you know that the values are being sorted in a natural, obvious order.

Then, if you see a sort with a block, you are immediately alerted to the fact that this sort is an exception to whatever obvious, natural order exists for that object. The computer doesn’t care, but it makes the code that extra bit easier to read: it makes the exceptional cases… exceptional.


class Value
  def <=> other
    # ...
  end
end

values.sort # uses the <=> operator: I want the natural order

values.sort { |a,b| ... } # eschews the boat: this block MUST sort by something unusual

Multi-paradigm programming

There is a special case of the idea that “When there is more than one way to do it, each way should be used differently.” Some languages, such as Ruby, are deliberately multi-paradigm: not only are there several ways to do things, there are several entirely different philosophies the language supports. In Ruby”s case, for example, you find the usual assortment of imperative structured programming suspects like for, while, and until. But you also find some of the more obvious functional programming suspects, like lambdas as first-class values and Enumerable’s collection methods :map, :select, and :detect.

In Beautiful Code, leading computer scientists offer case studies that reveal how they found unusual, carefully designed, and beautiful solutions to high-profile projects. Beautiful Code is an opportunity for master coders to tell their story.

I think the idea that doing the same thing in different ways signals different intents scales up to the paradigms you use. If you have a list of phone numbers (say for people you'd like to call), and you wish to remove certain numbers from it (say a do-not-call list), in Ruby I think you should always use some kind of functional approach like combining :reject and :include or :detect. If you write this out using for loops or :each, it signals that there are side effects going on.

By making functional things look functional, you clearly signal that you are performing a calculation strictly for the result. Using imperative syntax like for and while should be reserved for the times you need to mutate variables or cause other side effects. Naturally, you may need to break this so-called rule from time to time for performance purposes. But you’re an adult, you know that just because a rule needs to be broken here and there for pragmatic reasons doesn’t mean you toss it out and write all of your code using whatever keyword is laying about at the time.

Likewise, OO is a great paradigm—when you are modelling entities in the real world or when you need long-term persistent state. Languages like Ruby force you to use objects behind the scenes, but you code shouldn’t look OO unless you are trying to signal your colleagues that there are entities involved. For this reason, you should use Proc.new when you want objects and lambda when you want functions.

The benefits of traveling far while you build a life at home

People sometimes ask why they should bother to learn other languages or other programming styles, when they do not have an opportunity to use those languages at their 9-5. I believe that if you embrace the idea of multi-paradigm programming, these other languages can teach you useful techniques such as using :unfold to model iteration over data structures.

Of course, people will ask why you are using a Haskell idiom in Ruby, or a Lisp idiom in Java. One of the answers is that it may be more expressive. Another answer could be that by making functional things look functional, by making OO things look OO, by making distributed things look distributed, and so on, by borrowing paradigms from languages where those ideas are fully exploited, you can make your Ruby and Java code signal its intent more clearly.

Say what you mean and mean what you say

In the end, this is a really simple idea: when you have several different ways to do something, your first choice should be the way that signals what you are trying to accomplish in a natural and obvious way. That may mean borrowing an idiom from a language that expresses that idea more succintly or more directly.

And when you eschew the natural and obvious idiom, it should be to signal that you have a different intent, that your exception to the standard idiom reflects the code’s exceptional nature.

¶ 5:38 PM

Comments on “Programming conventions as signals”:

Good article man. Meaning what you say and saying what you mean sounds like it can more easily accomplished with a declarative style programming language than an imperative. How might java/c and co deal with this problem when to make their code mean what they want it to mean and say what they want it to say its buried beneath too many details and not enough abstraction?

# posted by

Erock : 8:28 PM

Is it December already, Reg?

Great post, BTW.

# posted by

Matt : 10:01 PM

This is also why having if & unless, and being able to use them in two different places is a good thing(TM).

x = 1 if foo

implies that foo will almost always evaluate to true

if foo then x = 1

implies that the condition is more important/variable.

# posted by

Anonymous : 6:38 AM

This seems like a really bad idea. On teams of more than 2 or 3 it is really, really hard to agree on simple coding standards (like how many spaces between parans, commas and binary operators). Secondly it is hard to remember complicated rules.

Coding standards/conventions are best when drearily simple. There is nothing worse than code that lies to you ... and if you rely on some archaic convention then you are asking for trouble.

Also - Links that show a preview when rolled over??? Is that an ad or do you really think that is a good thing?

# posted by

Anonymous : 3:59 PM

Undocumented, unvalidated conventions? No thanks. What if people start assuming your code sticks to them? Bridge auctions are hardly a communication model to aspire to. :)

That said, I don't mind the idea of using it for paradigm hints, since paradigms are a lot fuzzier and more fluid than the attributes you were denote earlier on.

# posted by

Greg : 8:46 PM

"Undocumented, unvalidated conventions? No thanks."

The best statement of the best objection.

If it did provide significant value, it would only balkanize a language's programmers or result in them having to learn scads of systems. The time required for one or two systems to take over is probably on the same scale as the lifetime of the language.

Using a paradigm to signal intent is even worse -- it requires ignoring the fact that in many situations, one paradigm is much less awkward or more performant than others.

# posted by

Anonymous : 8:31 AM

Undocumented

Who said you can't document your conventions? Isn't that what teams do now when they establish a coding standard for a project?

What is the difference between hiring an architecture astronaut who declares that indents shall always consist of two (2) spaces, never tabs, and having a team agree that {} means one thing and do..end means another?

Likewise, there are zillions of conventions in place today. What are ExtremelyLongCamelCaseFactoryFactory names, if not unvalidated conventions? Or using Bang! method names to indicate state changes (sort vs. sort!)? or Python's convention that verbs are used for imperatives and past participles are used for expressions (sort vs. sorted).

Unvalidated conventions are all around us. If the particular examples are bad, so be it. But I find it hard to believe that there are ANY non-trivial projects that run without programming conventions of one sort or another, documented or not.

# posted by

Reginald Braithwaite : 9:00 AM

I hate the so-called word "Performant." It is a convention, a way of signaling that you belong to a particular subculture of nerds.

# posted by

Reginald Braithwaite : 9:01 AM

in many situations, one paradigm is much less awkward or [faster] than others.

Please re-read the part of the post where I assumed that you are an adult and know that you may need to deviate from the so-called rule for performance reasons. I said pretty much that exact thing. Was I not clear???

In fact, this is an argument IN FAVOUR of conventions. When you see someone using nested for loops with mutable variables and what-not instead of a simple functional approach, you are alerted to the fact that there is something exceptional about that code, such as THE NEED FOR HIGH PERFORMANCE.

If your team has no understanding that functional things should look functional and side-effect things should look imperative, some of your code will look functional, some imperative. Randomly.

And you will never know the difference between Joe Basic programmer who wrote a for loop because that's all he knows and Hanrietta Haskeller who used functional code because she believes in the purity of the code's essence.

Yes, this is going to be hard work to evangelize and enforce. You might want to--I don't know if this suggestion is too radical for your team--have code reviews where you discuss issues like this? Where you ask someone to rewrite their monadic creation into a loop so that it communicates its mutable state intent more clearly, even though it pases all tests?

# posted by

Reginald Braithwaite : 9:08 AM

In this video on code quality (http://video.yahoo.com/video/play?vid=529579) Douglas Crockford talks about enforcing code convention across the entire organization, and states that these conventions add considerable value to the organization, since it contributes to the code quality. Ignoring conventions on large teams is asking for trouble, not adhering to them.

# posted by

Erock : 4:43 PM

People too often complain that they can't tell what some source code is doing, but I claim that the true obstacle is determining why the code does something. Mandating a single, canonical, "Right" expression removes the ability to put that information in the formal language, requiring comments to make it explicit (with extra verbage that dilutes important information and with the potential for the comments to fall out of sync with the code).

# posted by

Mark P Sullivan : 2:24 AM

I've most recently been working with a language that bolted on a number of more or less functional "array_()" functions. One particular task, filtering associative array keys based on a pattern, required the use of 3 of these functions: array_filter, array_intersect_key and array_flip. These wound up being close to twice as slow as using a for loop, so there *are* other reasons, and before I hear the chorus of "premature optimization", my findings were in conjunction with profiling a platform to be integrated with some of the most heavily visited sites in the world

# posted by

George Jempty : 5:59 PM

The bulk of the article is fine, but the use of final in Java is simply preposterous!

Had java provided *another* keyword to mean "constant", say "const" and you had advocated its use then your article would be entirely fine.

But as things are final in Java *also* signal that you have created a "read-only closure" and seeing "final", in that context, used for anything else is worse than a distraction: a scanning all over the method's code to locate what anonymous class instance, if any, is referring to it!

Please, don't spread *utterly bad* advice.

Thanks

# posted by

verec : 6:29 PM

Verec:

final in java and const in C++ are completely different things. Final is no more and no less than saying youwill not cause the contents of a variable--which is usually a reference type to change.

Thanks to Java's implementation of anonymous classes, only final parameters and instance variables may be referenced. That's true.

So it's also true that you could restrict the keyword to those variables you intend to use in anonymous classes. That's fine with me. If that signal works for you I won;t argue

In fact, I think you just want to use a different signal than I, just as some Bridge players use 1 club to signal a very strong opening hand, and some use it to signal an indifferent hand.

As for whether using final to signal you don't intend to rebind to different reference types...

While I like your suggestion, I honestly think you're way off base to suggest that final should not be used as it was originally intended and in such a straightforward manner.

Declaring something final does not signal the creation of a read-only closure unless you agree in advance that that is how you intend to use it.

# posted by

Reginald Braithwaite : 7:26 PM

I find the example of {} versus do/end interesting. To me, it shows that language designers who throw in numerous aliases for the same thing WASTE THE OPPORTUNITY to use the syntax to establish conventions that will span the whole language community.

On the other hand, when the language itself pushes you one way or the other, using the appropriate construct is just good readability. I would almost never put a side-effect causing operation in a Python list comprehension because you're using a feature that is side-effect-free 99% of the time for its side effects: that's just going to be confusing. You can't do assignments directly in list comprehensions so the feature Obvious Wasn't Designed For Side Effects.

But conversely in Python, single-quoted-strings and double-quoted-strings are the same. So where in Ruby I might establish some convention of using double-quoted strings only where I expect interpolation, in Python I would not. I seem to recall that Tim Berners-Lee said at one point that languages like Python and XML that use single-quotes and double-quotes interchangably "waste" the opportunity to use them for something more useful.

To knock on Python again, one could argue that if Guido had been more strict in separating the abilities of lists and tuples then it would be clearer when to use which and that would in turn make code clearer. You could imagine a universe in which lists could not be unpacked and tuples could not be iterated. (I'm not seriously advocating that those benefits would outweigh the practicality costs, though...)

I tend to think that it would be more hassle than it was worth to establish these kinds of conventions for a single team because you'd have to turn on and off the convention-filter whenever you read the work of someone outside of the team. But of course circumstances will vary.

# posted by

Paul Prescod : 5:56 AM

I also find what you said to make inherently more sense then just arbitrarily choosing which way to do something. Of course commenting your code is something every programmer should be doing from the start.

# posted by

Alex Nagy : 7:18 PM

<< Home