Abbreviation, Accidental Complexity, and Abstraction

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Tuesday, July 31, 2007

Modern programming languages provide a variety of mechanisms for translating a relatively short program into a huge number of instructions for the computer’s CPU. It is tempting to think that the purpose of “high level languages” like Java, C#, Smalltalk, Ruby, or even Lisp is to be a kind of decompression algorithm: you type 147 lines of code, and the compiler elaborates each line of code, producing several megabytes of executable.

If it were as simple as that, we would say that the “highest level language” is the one that allows us to express our programs in the smallest source code, perhaps in fewer symbols or lines of code. For example, we would say that (1 x shift) !~ /^1?$|^(11+?)\1+$/' is superior to:

function (x) {
        y = Math.floor(Math.sqrt(x));
        var a = new Array();

        a.push(2);
        for (i = 3; i <= x; i+=2) {
                a.push(i)
        }
        a.reverse();
        var primes = new Array();
        while ((current_prime = a.pop()) < y) {
                primes.push(current_prime);
                for (index in a) {
                        if (a[index] % current_prime == 0) {
                                a.splice(index,1);
                        }
                }
        }
        return a[0] == x;
}

Strictly because it is smaller by all obvious metrics (source).¹

Twenty years later, The Mythical Man-Month: Essays on Software Engineering is still one of the most important books ever written about developing software, from the small to the large. It’s time to read the book that spawned the expression, “there is no silver bullet.”

This explanation is clearly wrong. Even if both examples produced exactly the same result, the former is almost impossible to use by mortals: its form obfuscates its output. Clearly, writing the smallest possible program is not the goal.

Writing smaller programs is also not an anti-goal: longer programs are not automatically “easier to read and understand.” One of the problems with longer programs is that they often are longer by virtue of containing accidental complexity, swathes of yellow code.

abbreviations

Some shorter programs are shorter merely because they contain shorter constructs. For example, if you perform some regular expression pattern matching in Ruby, you can use / characters to delimit your regular expression. That’s an abbreviation for the more tedious Regexp.new(). And there are some global variables that are automatically set for you. For example, $1, $2 and so on are set to the matching groups.

So if you write /(fu|foo)(bar|blitz)/ =~ 'my program went fubar', $1 is automatically assigned the string 'fu' and $2 is assigned the string 'bar'.

Compare and contrast this to: my_matcher = Regexp.new('(fu|foo)(bar|blitz)').match('my program went fubar'). Now you can use my_matcher[1] and my_matcher[2] to extract 'fu' and 'bar'.

Obviously, the former expression is shorter, and quite handy. And while it may look a little cryptic to someone raised on Java’s one-size-fits-all syntax of everything is a .message, it really isn’t an obfuscation. It’s an abbreviation, nothing more. It makes programs shorter without changing their meaning in any substantial way.

accidental complexity

We mentioned earlier that longer programs are sometimes longer by virtue of containing accidental complexity. There’s a good point of comparison. If a shorter program is shorter by virtue of having less accidental complexity, it’s better. It has a higher ratio of signal to noise.

For example, here is one of the new for loops in Java:

for (Account account: customer.getAccountList()) {
    // do something
}

This is shorter than:

Iterator iAccount = customer.getAccountList().iterator();
while (iAccount.hasNext()) {
    final Account account = (Account) iAccount.next();
    // do somethinmg
}

It also removes some of the accidental complexity of the iterator. The new for loop removes some accidental complexity, raising the signal by eliminating noise. To continue with the same example, let’s look at an old for loop:

for (int i = 0, i < customer.getAccountList().size(); ++i) {
    final Account account = customer.getAccountList().get(i);
    // do something
}

This has even more accidental complexity, a loop index variable. Eliminating the loop index is a decent win, it eliminates fence post errors. But there is a bigger win in moving from an index-based loop to an iterator based loop or a new for loop: we have abstracted away the notion that the collection must be indexed by consecutive integers.

abstractions

The iterator (and the new for loops) work with all kinds of collections, including linked lists and sets. Moving from a loop index variable to an iterator does more than just abbreviate the code, it does more than hide some accidental complexity, it provides a general-purpose abstraction for operations on collections.

How do the experts solve difficult problems in software development? In Beautiful Code, leading computer scientists offer case studies that reveal how they found unusual, carefully designed solutions to high-profile projects. You will be able to look over the shoulder of major coding and design experts to see problems through their eyes.

This is not simply another design patterns book, or another software engineering treatise on the right and wrong way to do things. The authors think aloud as they work through their project’s architecture, the tradeoffs made in its construction, and when it was important to break rules. Beautiful Code is an opportunity for master coders to tell their story.

So here is another point of comparison: does the shorter program provide us with a useful abstraction? Some programs are shorter through mere abbreviation, some are shorter through hiding accidental complexity, and some are shorter by providing useful abstractions.

The difference between a new for loop and an index variable for loop may seem subtle. So let’s bring out a canonical example, one we touched on earlier: regular expressions. Can anyone seriously doubt that /(fu|foo)(bar|blitz)/ provides a powerful abstraction compared to a stack of loops and indexOf method calls?

There is more than abbreviation involved, more than hiding the accidental complexity of indexOf, there is a whole new mental model involved. A regular expression is declarative, it specifies what you what to find, and leaves the how to the language implementation. It is shorter, yes. But it is also much more powerful because it provides the programmer with a huge mental lever.

Active Record provides a very useful abbreviation that eliminates a large chunk of accidental complexity, dynamic finders. You can write User.find_all_by_street_and_city(street, cities). I won’t say what it returns, I trust it’s obvious.

You could easily write a find_all_by_street_and_city method in any language you care to name. Agreed. But if you write one yourself, you have to write one for ever different kind of query you need to make. And if you write it, you trust it.

But if you are maintaining someone else’s code, do you really trust it without reading it? Or do you have a peek to see whether there’s some weird business logic in there, like some special case treating the abbreviation “Hogtown” as a substitute for “Toronto”? Repeat this process for each search abbreviation method in the code base. What if one has a bug? Or another has a specific eager loading behaviour?

If you are using an ORM like ActiveRecord, once you’ve learned how dynamic finders work, you know how they all work. Furthermore, you have an abstraction you understand, you don’t have to peek under the hood to see what’s going on. Abstractions are better than abbreviations.

abstractions are not abbreviations

Abbreviations are useful. They can make code more readable by putting the all of the essential workings in one visible chunk. But they aren’t as powerful as constructs that remove accidental complexity or provide abstractions.

And some times, abbreviations are even harmful. If the programmer reading code must understand what is being abbreviated in order to understand the code, then the abbreviation merely forces the programmer to jump around the code to figure anything out. When programs are written like this as a matter of course, the poor programmer is forced to rely on powerful IDEs that can jump to method definitions or find references quickly. She has to have these tools, because she must read all of the code to understand what it does.

The abbreviations have introduced complexity, not removed it.

Where do such programs come from, programs where the abbreviations are not useful abstractions? From those same IDEs, of course, from mindlessly refactoring to eliminate duplicate code without stopping to design the program’s mental model.

This is not a knock against powerful IDEs, far from it. But we should realize that all the same arguments raised about powerful programming languages (“operator overloading is dangerous in the hands of mediocre programmers,” “macros enable people to write unreadable programs,” and so forth) apply to tools that shuffle code around, especially when the same tools seem to make it easy to navigate the shuffled program.

When composing our own programs, when using these tools, it is not enough to merely seek to eliminate duplication. We must be mindful of the distinction between abbreviation, removing accidental complexity, and introducing useful abstractions.

It is not wrong to eliminate redundancy in code. But when we do so, we mustn’t follow the path of least resistance and mindlessly perform the refactorings suggested by our tools. This argument exactly parallels the argument about making code shorter for its own sake. Code brevity in and of itself is not desirable, well-abstracted code with a minimum of accidental complexity is desirable, and brevity follows when these goals are attained.

Likewise, elimination of redundancy is not desirable in and of itself. But it serves to warn us of the need to seek useful abstractions and to remove accidental complexity. When we work with those goals in mind, the redundancy likewise melts away, and we are able to use the tools to improve our code.²

Abbreviations might be good.
Removing Accidental Complexity is better.
And providing Useful Abstractions is best of all.

And there's another difference: is 121 a prime number?
Thanks, jbstjohn!

Labels: popular

¶ 10:19 AM

Comments on “Abbreviation, Accidental Complexity, and Abstraction”:

Mark Jason dominus calls non informative but necessary code "structural code".
Even Perl 5 has a lot of structural code.
It needs a lot of code to process arguments due to the too low level abstraction @_ as an array of argument. Perl 6 will fix that and is partially inspired by Ruby. It will go way further because it will use arguments types for multimethod dispatching.

# posted by

cognominal : 11:28 AM

Very nice article.

What do you mean by the comment about 121 being prime? Surely the regex will match it with a string of eleven 1s, repeated eleven times?

(I assume the Ruby implementation is correct...)

# posted by

Laurie Cheers : 12:01 PM

Laurie:

Thank you!

Two bad assumptions, though: First, it isn't Ruby, it's Javascript :-)

So let's go to the source page and try it. The original program finds all the primes up to a certain number.

Fine, put 125 in the box. It tells us 121 is prime!

# posted by

Reginald Braithwaite : 12:06 PM

Good article, except I'd say not everything is rosy in the abstraction camp either. See "Leaky Abstraction". While the dynamic finders in ActiveRecord are a great abstraction, they can also lead people down a path of creating non-performant queries. If I was a DBA, I suspect I might be railing (heh) against usage of such abstractions on MY databases.

# posted by

Patrick Mueller : 2:44 PM

The Javascript function looks like it has a bug. I don't know Javascript all that well, but a quick look at the function tells me that he might want "while ((current_prime = a.pop()) <= y)" instead of "while ((current_prime = a.pop()) < y)" The sieve is suppose to go up to the square root if n, and not remain strictly less than the square root of n.

# posted by

Benjamin Cutler : 11:53 PM

Benjamin:

That is exactly the bug, nice job being the first to find it.

# posted by

Reginald Braithwaite : 11:58 PM

This reminds me of an article I saw quite some time ago (I tried to Google for it, but I couldn't seem to remember enough of the details to perform a successful search) where a Java programmer (or C# programmer I can't remember which) was comparing Ruby's method of loop abstraction (.each) with with the corresponding version in his language. His conclusion was that his favorite language was was superior because "the Ruby version was doing stuff under the hood" which kind of struck me as odd, because both versions do stuff under the hood. As you mentioned, all higher level languages do. There's a lot of assembly under the hood that's being abstracted away.

Heck that's why we don't program in assembly language (if we don't have to), we want to abstract the details that don't matter away. I want my language to do stuff under the hood, I want it to provide me with useful abstractions that allow me to express what I mean without the extra fluff.

# posted by

Benjamin Cutler : 12:27 AM

Nice entry!
I enjoyed the exposition of declarative programming (what vs. how) in yout article.
I once read somewhere a nice statement about readability and the shortness of code:
'be concise, but not to the point of obfuscation'

I think that's pretty compliant to your propositions :o)

# posted by

Mario Gleichmann : 2:46 PM

Another very good article and it brings to mind a quote from Edsger Dijkstra, who else:

In this connection it might be worthwhile to point out that the purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

So Reg, when are you going to take the next step and put all your knowledge and exploration of language semantics and usability into practice and create your own language?

Regards,
Norbert

# posted by

Norbert Winklareth : 8:40 AM

Shortening code for the sake of proving a point or reducing file size has been around since the early days.

The clever programmer is the one who knows when shorter code may be more elegant but actually adds processing time and avoids using it.

The great programmers are the ones who consider short code, processing time and updatability with every line of code they write.

The hack programmer is the one who makes short code to show off and uses variables like v1..v300 in a pathetic attempt at job security.

And finally the idiot hack programmer is the one who uses v1..v300 and throws away his crib sheet that lists what those ridiculous variables stand for. (Yes, I had the pleasure to work with such a brainiac; thankfully I was managing the system and not programming on it.)

# posted by

Michele : 5:02 PM

"...Code brevity in and of itself is not desirable, well-abstracted code with a minimum of accidental complexity is desirable..."

I agree. It is a lot like complexity of manufactured items. More parts = more cost and less reliability.

http://www.tech4d.com/blog/2008/01/01/the-cost-of-complexity/

# posted by

Frank : 3:30 PM

Beautiful point, pointedly made. I pose a question about a little side-matter: the bit about powerful auto-refactoring, auto-completing, easy-navigating IDEs in the wrong hands.

What positive balancing effect do we get from the extent to which these IDEs simply encourage and enable novice programmers to learn about code and coding? I might contend that since we get at what we do all the time, we might actually learn refactorings and code smells better and faster through these tools. We might also learn about design generally more quickly.

Finally, we might try more and more times, like the intrepid climber, because the experiments are cheap. We might be become more courageous, and eventually better at seeing these key distinctions you describe.

That's very different from the learning path we go down when we use drag-and-drop and auto-generate exclusively...

# posted by

patrickwilsonwelsh : 4:59 PM

<< Home