Abbreviation, Accidental Complexity, and Abstraction
Modern programming languages provide a variety of mechanisms for translating a relatively short program into a huge number of instructions for the computer’s CPU. It is tempting to think that the purpose of “high level languages” like Java, C#, Smalltalk, Ruby, or even Lisp is to be a kind of decompression algorithm: you type 147 lines of code, and the compiler elaborates each line of code, producing several megabytes of executable.
If it were as simple as that, we would say that the “highest level language” is the one that allows us to express our programs in the smallest source code, perhaps in fewer symbols or lines of code. For example, we would say that
(1 x shift) !~ /^1?$|^(11+?)\1+$/'
is superior to:
function (x) {
y = Math.floor(Math.sqrt(x));
var a = new Array();
a.push(2);
for (i = 3; i <= x; i+=2) {
a.push(i)
}
a.reverse();
var primes = new Array();
while ((current_prime = a.pop()) < y) {
primes.push(current_prime);
for (index in a) {
if (a[index] % current_prime == 0) {
a.splice(index,1);
}
}
}
return a[0] == x;
}
Strictly because it is smaller by all obvious metrics (
source).
1This explanation is clearly wrong. Even if both examples produced exactly the same result, the former is almost impossible to use by mortals: its form obfuscates its output. Clearly, writing the smallest possible program is not the goal.
Writing smaller programs is also not an anti-goal: longer programs are not automatically “easier to read and understand.” One of the problems with longer programs is that they often are longer by virtue of containing
accidental complexity, swathes of
yellow code.
abbreviationsSome shorter programs are shorter merely because they contain shorter constructs. For example, if you perform some regular expression pattern matching in Ruby, you can use
/
characters to delimit your regular expression. That’s an abbreviation for the more tedious
Regexp.new()
. And there are some global variables that are automatically set for you. For example,
$1
,
$2
and so on are set to the matching groups.
So if you write
/(fu|foo)(bar|blitz)/ =~ 'my program went fubar'
,
$1
is automatically assigned the string
'fu'
and
$2
is assigned the string
'bar'
.
Compare and contrast this to:
my_matcher = Regexp.new('(fu|foo)(bar|blitz)').match('my program went fubar')
. Now you can use
my_matcher[1]
and
my_matcher[2]
to extract
'fu'
and
'bar'
.
Obviously, the former expression is shorter, and quite handy. And while it may look a little cryptic to someone raised on Java’s one-size-fits-all syntax of everything is a .message, it really isn’t an obfuscation. It’s an abbreviation, nothing more. It makes programs shorter without changing their meaning in any substantial way.
accidental complexityWe mentioned earlier that longer programs are sometimes longer by virtue of containing accidental complexity. There’s a good point of comparison. If a shorter program is shorter by virtue of having less accidental complexity, it’s better.
It has a higher ratio of signal to noise.
For example, here is one of the new
for
loops in Java:
for (Account account: customer.getAccountList()) {
// do something
}
This is shorter than:
Iterator iAccount = customer.getAccountList().iterator();
while (iAccount.hasNext()) {
final Account account = (Account) iAccount.next();
// do somethinmg
}
It also removes some of the accidental complexity of the iterator. The new
for
loop removes some accidental complexity, raising the signal by eliminating noise. To continue with the same example, let’s look at an old
for
loop:
for (int i = 0, i < customer.getAccountList().size(); ++i) {
final Account account = customer.getAccountList().get(i);
// do something
}
This has even more accidental complexity, a loop index variable. Eliminating the loop index is a decent win, it eliminates fence post errors. But there is a bigger win in moving from an index-based loop to an iterator based loop or a new for loop: we have abstracted away the notion that the collection must be indexed by consecutive integers.
abstractionsThe iterator (and the new
for
loops) work with all kinds of collections, including linked lists and sets. Moving from a loop index variable to an iterator does more than just abbreviate the code, it does more than hide some accidental complexity, it provides a general-purpose abstraction for operations on collections.
How do the experts solve difficult problems in software development? In Beautiful Code, leading computer scientists offer case studies that reveal how they found unusual, carefully designed solutions to high-profile projects. You will be able to look over the shoulder of major coding and design experts to see problems through their eyes.
This is not simply another design patterns book, or another software engineering treatise on the right and wrong way to do things. The authors think aloud as they work through their project’s architecture, the tradeoffs made in its construction, and when it was important to break rules. Beautiful Code is an opportunity for master coders to tell their story.
So here is another point of comparison: does the shorter program provide us with a useful
abstraction? Some programs are shorter through mere abbreviation, some are shorter through hiding accidental complexity, and some are shorter by providing useful abstractions.
The difference between a new
for
loop and an index variable
for
loop may seem subtle. So let’s bring out a canonical example, one we touched on earlier: regular expressions. Can anyone seriously doubt that
/(fu|foo)(bar|blitz)/
provides a powerful abstraction compared to a stack of loops and
indexOf
method calls?
There is more than abbreviation involved, more than hiding the accidental complexity of
indexOf
, there is a whole new mental model involved. A regular expression is
declarative, it specifies what you what to find, and leaves the how to the language implementation. It is shorter, yes. But it is also much more powerful because it provides the programmer with a huge mental lever.
Active Record provides a very useful abbreviation that eliminates a large chunk of accidental complexity,
dynamic finders. You can write
User.find_all_by_street_and_city(street, cities)
. I won’t say what it returns, I trust it’s obvious.
You could easily write a
find_all_by_street_and_city
method in any language you care to name. Agreed. But if you write one yourself, you have to write one for ever different kind of query you need to make. And if you write it, you trust it.
But if you are maintaining someone else’s code, do you
really trust it without reading it? Or do you have a peek to see whether there’s some weird business logic in there, like some special case treating the abbreviation “Hogtown” as a substitute for “Toronto”? Repeat this process for each search abbreviation method in the code base. What if one has a bug? Or another has a specific eager loading behaviour?
If you are using an ORM like ActiveRecord, once you’ve learned how dynamic finders work, you know how they all work. Furthermore, you have an abstraction you understand, you don’t have to peek under the hood to see what’s going on. Abstractions are better than abbreviations.
abstractions are not abbreviationsAbbreviations are useful. They can make code more readable by putting the all of the essential workings in one visible chunk. But they aren’t as powerful as constructs that remove accidental complexity or provide abstractions.
And some times, abbreviations are even harmful. If the programmer reading code must understand what is being abbreviated in order to understand the code, then the abbreviation merely forces the programmer to jump around the code to figure anything out. When programs are written like this as a matter of course, the poor programmer is forced to rely on powerful IDEs that can jump to method definitions or find references quickly. She has to have these tools, because she must read
all of the code to understand what it does.
The abbreviations have introduced complexity, not removed it.
Where do such programs come from, programs where the abbreviations are not useful abstractions? From those same IDEs, of course, from mindlessly refactoring to eliminate duplicate code without stopping to
design the program’s mental model.
This is not a knock against powerful IDEs, far from it. But we should realize that all the same arguments raised about powerful programming languages (“operator overloading is dangerous in the hands of mediocre programmers,” “macros enable people to write unreadable programs,” and so forth) apply to tools that shuffle code around, especially when the same tools seem to make it easy to navigate the shuffled program.
When composing our own programs, when using these tools, it is not enough to merely seek to eliminate duplication. We must be mindful of the distinction between abbreviation, removing accidental complexity, and introducing useful abstractions.
It is not wrong to eliminate redundancy in code. But when we do so, we mustn’t follow the path of least resistance and mindlessly perform the refactorings suggested by our tools. This argument exactly parallels the argument about making code shorter for its own sake. Code brevity in and of itself is not desirable, well-abstracted code with a minimum of accidental complexity is desirable, and brevity follows when these goals are attained.
Likewise, elimination of redundancy is not desirable in and of itself. But it serves to warn us of the need to seek useful abstractions and to remove accidental complexity. When we work with those goals in mind, the redundancy likewise melts away, and we are able to use the tools to improve our code.
2Abbreviations might be good.
Removing Accidental Complexity is better.
And providing Useful Abstractions is best of all.
- And there's another difference: is 121 a prime number?
- Thanks, jbstjohn!
Labels: popular