Abbreviation, Accidental Complexity, and Abstraction
Modern programming languages provide a variety of mechanisms for translating a relatively short program into a huge number of instructions for the computer’s CPU. It is tempting to think that the purpose of “high level languages” like Java, C#, Smalltalk, Ruby, or even Lisp is to be a kind of decompression algorithm: you type 147 lines of code, and the compiler elaborates each line of code, producing several megabytes of executable.
If it were as simple as that, we would say that the “highest level language” is the one that allows us to express our programs in the smallest source code, perhaps in fewer symbols or lines of code. For example, we would say that
(1 x shift) !~ /^1?$|^(11+?)\1+$/' is superior to:
function (x) {
y = Math.floor(Math.sqrt(x));
var a = new Array();
a.push(2);
for (i = 3; i <= x; i+=2) {
a.push(i)
}
a.reverse();
var primes = new Array();
while ((current_prime = a.pop()) < y) {
primes.push(current_prime);
for (index in a) {
if (a[index] % current_prime == 0) {
a.splice(index,1);
}
}
}
return a[0] == x;
}
Strictly because it is smaller by all obvious metrics (
source).
1This explanation is clearly wrong. Even if both examples produced exactly the same result, the former is almost impossible to use by mortals: its form obfuscates its output. Clearly, writing the smallest possible program is not the goal.
Writing smaller programs is also not an anti-goal: longer programs are not automatically “easier to read and understand.” One of the problems with longer programs is that they often are longer by virtue of containing
accidental complexity
, swathes of
yellow code.
abbreviationsSome shorter programs are shorter merely because they contain shorter constructs. For example, if you perform some regular expression pattern matching in Ruby, you can use
/ characters to delimit your regular expression. That’s an abbreviation for the more tedious
Regexp.new(). And there are some global variables that are automatically set for you. For example,
$1,
$2 and so on are set to the matching groups.
So if you write
/(fu|foo)(bar|blitz)/ =~ 'my program went fubar',
$1 is automatically assigned the string
'fu' and
$2 is assigned the string
'bar'.
Compare and contrast this to:
my_matcher = Regexp.new('(fu|foo)(bar|blitz)').match('my program went fubar'). Now you can use
my_matcher[1] and
my_matcher[2] to extract
'fu' and
'bar'.
Obviously, the former expression is shorter, and quite handy. And while it may look a little cryptic to someone raised on Java’s one-size-fits-all syntax of everything is a .message, it really isn’t an obfuscation. It’s an abbreviation, nothing more. It makes programs shorter without changing their meaning in any substantial way.
accidental complexityWe mentioned earlier that longer programs are sometimes longer by virtue of containing accidental complexity. There’s a good point of comparison. If a shorter program is shorter by virtue of having less accidental complexity, it’s better.
It has a higher ratio of signal to noise.
For example, here is one of the new
for loops in Java:
for (Account account: customer.getAccountList()) {
// do something
}
This is shorter than:
Iterator iAccount = customer.getAccountList().iterator();
while (iAccount.hasNext()) {
final Account account = (Account) iAccount.next();
// do somethinmg
}
It also removes some of the accidental complexity of the iterator. The new
for loop removes some accidental complexity, raising the signal by eliminating noise. To continue with the same example, let’s look at an old
for loop:
for (int i = 0, i < customer.getAccountList().size(); ++i) {
final Account account = customer.getAccountList().get(i);
// do something
}
This has even more accidental complexity, a loop index variable. Eliminating the loop index is a decent win, it eliminates fence post errors. But there is a bigger win in moving from an index-based loop to an iterator based loop or a new for loop: we have abstracted away the notion that the collection must be indexed by consecutive integers.
abstractionsThe iterator (and the new
for loops) work with all kinds of collections, including linked lists and sets. Moving from a loop index variable to an iterator does more than just abbreviate the code, it does more than hide some accidental complexity, it provides a general-purpose abstraction for operations on collections.


How do the experts solve difficult problems in software development? In Beautiful Code
, leading computer scientists offer case studies that reveal how they found unusual, carefully designed solutions to high-profile projects. You will be able to look over the shoulder of major coding and design experts to see problems through their eyes.
This is not simply another design patterns book, or another software engineering treatise on the right and wrong way to do things. The authors think aloud as they work through their project’s architecture, the tradeoffs made in its construction, and when it was important to break rules. Beautiful Code is an opportunity for master coders to tell their story.
So here is another point of comparison: does the shorter program provide us with a useful
abstraction? Some programs are shorter through mere abbreviation, some are shorter through hiding accidental complexity, and some are shorter by providing useful abstractions.
The difference between a new
for loop and an index variable
for loop may seem subtle. So let’s bring out a canonical example, one we touched on earlier: regular expressions. Can anyone seriously doubt that
/(fu|foo)(bar|blitz)/ provides a powerful abstraction compared to a stack of loops and
indexOf method calls?
There is more than abbreviation involved, more than hiding the accidental complexity of
indexOf, there is a whole new mental model involved. A regular expression is
declarative, it specifies what you what to find, and leaves the how to the language implementation. It is shorter, yes. But it is also much more powerful because it provides the programmer with a huge mental lever.
Active Record provides a very useful abbreviation that eliminates a large chunk of accidental complexity,
dynamic finders. You can write
User.find_all_by_street_and_city(street, cities). I won’t say what it returns, I trust it’s obvious.
You could easily write a
find_all_by_street_and_city method in any language you care to name. Agreed. But if you write one yourself, you have to write one for ever different kind of query you need to make. And if you write it, you trust it.
But if you are maintaining someone else’s code, do you
really trust it without reading it? Or do you have a peek to see whether there’s some weird business logic in there, like some special case treating the abbreviation “Hogtown” as a substitute for “Toronto”? Repeat this process for each search abbreviation method in the code base. What if one has a bug? Or another has a specific eager loading behaviour?
If you are using an ORM like ActiveRecord, once you’ve learned how dynamic finders work, you know how they all work. Furthermore, you have an abstraction you understand, you don’t have to peek under the hood to see what’s going on. Abstractions are better than abbreviations.
abstractions are not abbreviationsAbbreviations are useful. They can make code more readable by putting the all of the essential workings in one visible chunk. But they aren’t as powerful as constructs that remove accidental complexity or provide abstractions.
And some times, abbreviations are even harmful. If the programmer reading code must understand what is being abbreviated in order to understand the code, then the abbreviation merely forces the programmer to jump around the code to figure anything out. When programs are written like this as a matter of course, the poor programmer is forced to rely on powerful IDEs that can jump to method definitions or find references quickly. She has to have these tools, because she must read
all of the code to understand what it does.
The abbreviations have introduced complexity, not removed it.
Where do such programs come from, programs where the abbreviations are not useful abstractions? From those same IDEs, of course, from mindlessly refactoring to eliminate duplicate code without stopping to
design the program’s mental model.
This is not a knock against powerful IDEs, far from it. But we should realize that all the same arguments raised about powerful programming languages (“operator overloading is dangerous in the hands of mediocre programmers,” “macros enable people to write unreadable programs,” and so forth) apply to tools that shuffle code around, especially when the same tools seem to make it easy to navigate the shuffled program.
When composing our own programs, when using these tools, it is not enough to merely seek to eliminate duplication. We must be mindful of the distinction between abbreviation, removing accidental complexity, and introducing useful abstractions.
It is not wrong to eliminate redundancy in code. But when we do so, we mustn’t follow the path of least resistance and mindlessly perform the refactorings suggested by our tools. This argument exactly parallels the argument about making code shorter for its own sake. Code brevity in and of itself is not desirable, well-abstracted code with a minimum of accidental complexity is desirable, and brevity follows when these goals are attained.
Likewise, elimination of redundancy is not desirable in and of itself. But it serves to warn us of the need to seek useful abstractions and to remove accidental complexity. When we work with those goals in mind, the redundancy likewise melts away, and we are able to use the tools to improve our code.
2Abbreviations might be good.
Removing Accidental Complexity is better.
And providing Useful Abstractions is best of all.
- And there's another difference: is 121 a prime number?
- Thanks, jbstjohn!
Labels: popular
Haggling about the price
A few minutes ago I saw this interesting post:
Ajax Web Developer: $240k per year… with only one catch. If you want the executive summary, the job location is Iraq, a battle zone.
What I find interesting about jobs like this are the people who debate whether $240,000 is enough to compensate for the very real possibility of being killed:
Unfortunately, the whole 12/hr/day, 7/day/week deal works out to only about $52/hour…
I think that for $240k a year, I could shift my sleep schedule to iRaq time…
I think it would be a cool opportunity, but the 12 hour days, 7 days a week drops the money down to just over $50 an hour. If there is the possibility of being shot at, I need at least $80 per hour :-P
I would take 300k, though probably (?) not 200 to work in Iraq…
And lots of people were debating the tax-free status of this position, as if that’s the only deciding factor between staying or going. I’m sure many of the people commenting were being facetious, however Winston Churchill really put things into perspective in
this apocryphal exchange with a socialite:
Churchill: “Madam, would you sleep with me for five million pounds?”
Socialite: “My goodness, Mr. Churchill… Well, I suppose… we would have to discuss terms, of course…”
Churchill: “Would you sleep with me for five pounds?”
Socialite: “Mr. Churchill, what kind of woman do you think I am?!”
Churchill: “Madam, we’ve already established that. Now we are haggling about the price.”
If you’re going to sacrifice your life on principle, I commend you for your bravery.
But when it comes to money, either you incorruptible, or you have decided that your life is a commodity, something that can be bought or sold, traded or bartered, played as a pawn and sacrificed so that some contracting company somewhere makes another billion dollars for its shareholders.
but I’m not working in a battle zone…This all seems very surreal, I’m quite sure few or none of my blog readers are wearing combat fatigues and body armour right now.
But the principle applies elsewhere. If you ever find yourself being subjected to personal abuse, or manipulative behaviour by your manager, or asked to lie to clients, or outright cheat such as double billing, you are in the exact same position as someone asked to sacrifice their life. You have to decide whether you are sacrificing your self-esteem for a noble cause, or whether it’s about the money.
And trust me on this one, if it’s about the money, you are on the road of perpetual unhappiness, where every well is dry and every inn is full, and your journey never ends.
If you have a dream, and making that dream a reality involves tough business decisions, good for you. I commend you for having the courage to duke it out with VCs, lawyers, and whomever else will try to get you to mortgage your honesty to fill their coffers.
But if you are dragging yourself to work every day, if you hate what your boss makes you do, if it’s about the money and the vacations and the toys, but not about fundamental satisfaction with your job, you need to
stop right now, sit down with the people you love and trust, and reëvaluate your choices.
Trust me on this one. If $100,000 isn’t enough, neither is $200,000, and $300,000 won’t do it either. We’ve established what they want you to become. All you’re doing is haggling about the price.
the sun also risesThe good news is that unless you have signed up for a two year tour of duty with the army, you can stop any time. You can get off the treadmill. You can just stop doing things that aren’t consistent with your values.
I’m not saying you shouldn’t work long hours, or compromise your technical integrity by using a for loop instead of a map function. Making tough choices is part of growing up.
But you do not need to compromise your integrity, ever. And the good news is, you can just say
no. Tell your boss to handle all communications with the client if he doesn’t want you to mention that you only put in twenty hours on their project last week because he has you working for two clients at once.
Tell your client that no, you cannot submit an invoice for $2,500 but accept only $2,000 in payment, with her skimming the extra $500 as a secret commission.
It’s incredible, but this little word, no, it really works. The world does not stop. You don’t get dragged from the room and imprisoned without trial. If you are let go, you will find another job, a better one, a job with people who like you and respect you and want to make money in an honourable fashion.
We have a lot of freedoms. The point of those freedoms is to have an almost completely unrestrained ability to pursue happiness. All you have to do is make the right choices. And the first choice is to simply say no to anything you cannot abide.
That is the road that leads us into the sunshine.
Labels: passion
Lies, Damned Lies, and...
Statistics—by which I mean collected evidence—is useful for making decisions in a probabilistic environment. A probabilistic environment is one in which we have several alternatives, the alternatives have different outcomes, and we cannot assure ourselves of selecting the most beneficial outcome with the evidence we are able to gather.
The best we can do is make a choice that gives us the greatest likelihood of a good outcome, or perhaps limits the likelihood of a bad outcome.
We humans like to use evidence of past outcome to guide ourselves when making future choices. We’re pattern-matching machines. If we ate a green banana and it hurt our stomach in the past, we avoid green bananas in the future. Almost everything we do in our lives and our careers is based on this principle, although we wrap it up in
books full of formulae
and impressive phrases like
bayesian filtering.
The trouble with this is that humans rarely apply even a modicum of common sense to probabilistic decisions. For example, one huge issue is called
sampling bias. Consider hiring programmers, a popular topic. What does a certification with Sun or MSFT tell us about a candidate?
Well, let’s look at the evidence. Let’s pick 10,000 people randomly. Divide into two groups, programmers and non-programmers. What percentage of the programmers have certification? What percentage of the non-programmers have certification?
This evidence we just collected is exactly the same kind of evidence that spam classification systems use to determine whether emails are spam or not. So, can we apply this evidence to selecting people to interview for jobs as programmers?
The catch in this case is that
people applying for a job as programmers is not the same kind of sample as
the population at large. The samples are different. The filter (“select people with certification”) is most effective when the sample of people applying for the job most closely resembles the population at large, and it is least effective when the sample of people applying for the job does not resemble the population at large.
Imagine two companies, Alpha-Geeks (“A”), and BigCo (“B”). Alpha-Geeks is a start up working on something hip that you cannot explain to your Mother-in-Law. It is using one of the technologies covered by certification. BigCo is in the Consulting Industry, its clients are big corporations everyone has heard of.
The sample of people applying for jobs with Company A is very different than the sample of people applying for jobs with Company B. Don’t you think that Company B attracts many, many more submissions than Company A? And aren’t those submissions much more weighted to the average or even mediocre? (
Weighted towards doesn’t mean that talented people don’t apply for jobs at Company B, please keep your cool).
The filter is going to be far more useful with Company B than Company A,
because the sample of people applying for jobs at Company B is far more similar to the evidence sample than the sample of people applying for jobs at Company A.
And that’s the key to making good decisions based on evidence: your evidence sample must be very similar to your decision environment.
(I am not going down the rabbit hole of saying that people with certification have some sort of personality that doesn’t match a start up or anything along those lines. Such a thing may explain this result, but explanation is not needed: the very fact that the samples of applicants are different from each other is enough to understand the principle of sample bias.)
but is it good enoughOne argument is that even a flawed filter has some value. So even if certification is flawed, it’s still helpful. I’ve already taken a whack at certification, let’s take a whack at the diametrically opposite filter so give this some balance: let’s talk about asking applicants to solve some sort of problem in an interview.
Why do people administer this kind of problem? Possibly, because it makes them feel smarter than the applicant. Possibly because their mentor Joe Furrybeard from MIT did it that way. And possibly because they performed the following simple experiment: they asked everyone in their company to solve the problem at an off-site retreat, where 90% got it right. They have observed interviewees struggle with it, and fewer than 50% get it right.
They know that
Sturgeon’s Law applies to applicants (more
here and
here), so it looks like a winner: apply the test and throw the losers out on the spot.
Well, this is contentious. But let’s start with an easy issue. Remember that we tested people at a retreat? We could have the following phenomenon: most of the people at the company got it right because they were asked the problem in a relaxed setting, while many interviewers blow it under pressure. Our “evidence” that 90% of our staffers get it right is biased.
Selection bias happens when hiring in other ways: we could select by lifestyle (golfers need not apply at our start up unless they are
disc golfers). True, everyone at the company is good at this problem. But that’s only because we hire people who are good at problems like this, not because there is a correlation between this specific talent and someone’s job performance.
Even if everyone at the company is extremely talented, that may say nothing about the value of this interview problem. Let’s see how. Let’s apply two filters to everyone we interview: one is very good at predicting job performance, one that is poor. Perhaps the good one is something obvious like
previous performance under similar circumstances, while the bad one is
whether they play a musical instrument.
If we use both filters, we will select far fewer people than if we just use the good one. But when we survey our employees, we discover that they are good and that they play musical instruments. So why don’t we drop the long interviews asking about past performance and simply pick those who play a musical instrument?
Let’s think about a spam filter: it gets thousands of emails, we classify them by hand into spam and not-spam, and it learns the relevance of various pieces of evidence like words.
In our case, want to know if “plays musical instrument” is significant. We measure that everyone in the company plays a musical instrument. Very good. So we have a measure that 100% of the people we hired play musical instruments. But did we measure how many of the people we turned down for jobs played musical instruments? No. Maybe 50% or more of the rejects play musical instruments.
Well, if 100% of the hires play an instrument and 50% of the no-hires play an instrument, that’s still pretty useful, isn’t it? No way! Because, as we pointed out, our sample of employees is contaminated by the fact that we only hire musicians. Those statistics would only be relevant if there were no hidden correlation to our existing selection. But since our selection is contaminated, playing a musical instrument has about as much value as belonging to your
college club has for fitness to rule the nation.
The problem with both of our sample bias problems is simple: although what we did looked a little like a Bayesian filter, we did not sample real job applicants for our company and train our filters based on trusted classification. We trained our filters based on inappropriate data sets like the population at large or existing employees.
little liesSampling bias is a pervasive problem. Another way it creeps into decision-making is through
discounting. Humans have a way of filtering evidence before they use it to make decisions. Consider this situation: a drug company is putting a new Heart Medication through trials. 1,000 patients try it for two years years after having bypass surgery. The company reports that 60% of the patients reported better-than-average blah-blah-blah (cholesterol, mood, blood pressure, take your pick) two years after surgery.
Great medication? It sounds like 600 patients got better. What’s the problem?
Don’t settle for the press release, let’s look at the study parameters. Well, look at that, they threw 200 patients out of the survey. It seems they
died before the survey was completed. And what does improved blah-blah-blah mean? Well, there was a control group of 1,000 patients who didn’t take the drug. Five percent of them died, and the company threw them out using the same protocol as the test group. The survivor’s blah-blah-blah was measured, and the company is claiming that 60% of the people who survived the drug experience are better than the
median blah-blah-blah of the control group survivors.
But when you copy and paste the tables from the PDF into your calculator and do a quick calculation, you discover that although 60% are better than median, only 40% of the survivors are better than the average blah-blah-blah of the control group. The data set is strongly skewed.
Well, that’s effing
terrible. First, you have
four times the mortality rate. True, 60% of the survivors are better than average, but what appears to be going on is that a lot of them are only a little bit better, and the ones that are worse are much, much worse. That explains why there were 150 more deaths in the test group.
What happened here? The drug company discounted the problem results. It threw out the deaths and it chose to measure the median instead of the mean. Luckily for us, we don’t do this in software development.
Or do we?
How often do you hear someone report “Operation successful, but the patient died?” I hear it all the time. For example, people report that a project was problematic
Because the client kept changing requirements. I’m sure the client did change requirements, no word of a lie.
But what if we ask, “
What’s your success rate with your software development methodology?” Will the response be, “it almost never works, it depends on fixed requirements and that’s not what we observe in the field?” or will the response be, “it works just fine?”
What happens here is that people believe it
ought to work, so they discount the times it doesn’t work by blaming clients or programmers or managers. In effect, they are throwing the dead patients out of the study! I suspect what happens is that they have heard this worked at another company, with different people, different clients, different skill sets, everything different. But they ignore the obvious sample bias, they ignore the fact that what works at BigCo may not work for them, (or vice versa), so they discount their failures.
It goes the other way too. Some people throw the
survivors out of the study. Have you ever heard someone describe Ruby as a language that won’t work for large teams of mediocre programmers? Because, you know, they aren’t hackers? There may be some truth to it, but you’re also hearing someone discount evidence of success just as they discount evidence of failure.
My summary here is a short one: Statistics are only useful for decision-making when rigourously examined for fitness. You must be very, very certain that your evidence sample strongly resembles the decision-making environment, and you must be very careful that you don’t discount significant portions of your evidence.
I am going to give the final word on the subject to the Marketing Product Manager that steered JProbe to eight-figure revenues and a Jolt Award. (If you think software developers make decisions based on bad statistics, you will be amazed at what passes for evidence when marketing is discussed.)
The plural of anecdote is not data.
—Alan Armstrong
Ouch, that smarts
Smarts are about one tenth of what makes a business work. The rest is just shitty stuff like dealing with customers and partners and fixing bugs and reworking code and doing all sorts of lousy grunt work.
Friday Morning Surprise Essay Question
Compare and contrast:
Atwood's Law: any application that can be written in JavaScript, will eventually be written in JavaScript.
—Jeff Atwood
Lisp is a Black Hole: if you try to design something that’s not Lisp, but like Lisp, you’ll find that the gravitational forces on the design will suck it into the Black Hole, and it will become Lisp.
—Attributed to Guy Steele
To become popular, a programming language has to be the scripting language of a popular system.
And I thought I knew how low recruiters were prepared to go
... You tell them you already submitted your résumé and that they shouldn’t submit you again...
Well, I know of some agencies that once you give them this information, they will submit you—without your approval—just to get you disqualified so that their candidates have a better chance of getting the job.
Bill de hÓra on Designing for the Web
The relative verbosity of programming languages isn’t the interesting thing; nor is typing doctrine. What’s interesting is the culture of frameworks and what different communities deem valuable.
My sense of it is that on Java, too many web frameworks—think JSF, or Struts 1.x—consider the Web something you work around using software patterns. The goal is get off the web, and back into middleware. Whereas a framework like Django or Rails is purpose-built for the Web; integrating with the internal enterprise is a non-goal...
There are so many things frameworks like Rails/Django do ranging from architectural patterns around state management, to URL design, to testing, to template dispatching, to result pagination, right down to table coloring that the cumulative effect on productivity is startling. I suspect designing for the Web instead of around it is at least as important as language choice.
It’s hard to explain sometimes just how time-consuming it can be to get Web things done on some Java frameworks.
How to Run Javascript on the JVM in Just Fifteen Minutes
what and whyThis post is about executing Javascript inside the JVM without using a browser. Besides the fact that people are talking about
running Javascript on the server (
again, and
again), here’s why my colleagues and I used it on a recent project:
We have some logic that needs to run on the server and on the client, depending on when the application applies it. There is like an incredibly complex form validation involed. Think of a loan application, for example. Zillions of rules like “at least five years at current location
or at most three locations in ten years
or owns current location for at least one year.” The whole thing forms a big logical expression that needs to be evaluated in such a way that we can report which pieces are missing or do not meet requirements (
Declined because income is insufficient and does not state purpose of loan).
There are a couple of ways to handle this. One is to submit the form back to the server for validation. Another is to write everything in Java, but use
a sophisticated tool to render the Java into Javascript. Naturally, our team chose a third option, The Rails Way (available for
pre-order
).
We have a Domain-Specific Language for describing the rules. Business users use the DSL, and another tool writes code from that. We could, in theory, write Java methods for the server
and write Javascript for the client. We chose to start with Javascript, and we’ll write Java for the server if running Javascript on the server turns out to be
unperformant slow.
In the mean time, we decided that having some Java make a simple function call to a Javascript function and process a simple result was a reasonable first step. As a side benefit, we run all our server-side Javascript unit tests in Java test suites alongside our Java unit tests.
And after some fiddling around, we got Javascript working on the JVM. My bet is that you can get it working too, and it won’t take more than fifteen minutes.
Care to try it?
step zero: the Java Virtual Machine (JVM)You’ll need JDK 1.5 or 1.6 from Sun. If you already have this, move on to step one. Still reading? You’ll need to do a big install before we go further.
Go to the
downloads page and download the latest thing they have on offer with the words “JDK” in it. You won’t need JEE (the framework formerly known as J2EE) for this exercise, but if you know what it is you know enough to decide whether to download it.
Right now, you want
JDK 6u2. Go get it and suffer through the installation process.
Step one: Bean Scripting FrameworkJava6 has a new framework for running “scripting” languages, and it’s built into Java6. We’re not going to use it today, just because some of you may still need to make stuff work with JDK 1.5 in production. Instead, we’re going to go get the Jakarta Bean Scripting Framework (BSF).
You can download it here. We’ll need
bsf.jar.
step two: fix gotchasYMMV, but I found that I couldn’t get BSF working without including the
Jakarta Commons-Logging jar. So if you don’t have this floating around, go
here and download it. I experimented, and I could ignore everything except
commons-logging-1.1.jar. If that was missing, BSF kakked.
step three: RhinoSince we’re going to run Javascript, we need an interpreter.
Rhino to the rescue.
Download it. We’ll need
js.jar.
step four: keeping things organizedReady to code? Let’s start with a directory for all of our stuff. Call it
hello_javascript. For the sake of keeping thing simple, set up the sub-structure as follows:
hello_javascripthello_javascript\libYou may be using a fancy IDE, you may be using a text editor and have to graft your classpaths together with chicken wire. The important thing is that your classpath, besides including all of Java’s required stuff, and your own Java classes, also includes
bsf.jar,
commons-logging-1.1.jar and
js.jar.
We’ll put all three in the
lib subdirectory:
hello_javascript\lib\bsf.jarhello_javascript\lib\commons-logging-1.1.jarhello_javascript\lib\js.jarstep five: “Hello, Javascript”Let’s write some Java: create the following subdirectories and put a file called
HelloJavascript.java in it:
hello_javascript\com\raganwald\public\HelloJavascript.javaLet’s give it some code:
package com.raganwald.public;
import org.apache.bsf.BSFManager;
public class HelloJavascript {
public static void main (final String[] argv) {
final BSFManager manager = new BSFManager();
final Object jso = manager.eval("javascript", "(java)", 1, 1, "'hello, Javascript'");
System.out.println(jso.toString());
}
}
Run your new Java application. Did you see that? It interpreted some Javascript
in the JVM without a browser. Check your watch. Did you need more than a quarter of an hour? I didn’t think so.
You can try more ambitious code:
manager.eval(
"javascript", "(java)", 1, 1,
"var f = function (what) { return 'hello, ' + what; }; f('Javascript);");
including other files is an exercise left for the readerI didn’t find an easy way to get Javascript files to include other Javascript files. This isn’t the worst thing in the world, but you certainly don’t want to write anything substantial inside of Java strings. So try experimenting with reading javascript files right off the classpath.
I created a subdirectory called
javascript:
hello_javascript\javascript And you can read Javascript into your strings or Stringbuffers with some fairly simple code, thanks to a utility built into BSF:
import org.apache.bsf.util.IOUtils;
// ...
static String readScript(final String fileName) throws Exception {
final FileReader in = new FileReader(fileName);
return IOUtils.getStringFromReader(in);
}
That reads some script into a string. You can then prepend it to whatever you want to evaluate. Note that if you want to set up some sort of simple checking to make sure that you don’t “include” the same file twice, you will need to write yourself a little framework for that, perhaps using a
Set to keep track of what you’ve already loaded.
garbage in, garbage out


Prototype and Scriptaculous are the Javascript libraries that make slick transitions and UI effects easy one-liners. Prototype does more than just make an application look good: it adds Ruby and Smalltalk-like methods for handling Hashes, Arrays, and the DOM.
This book is one of the fastest ways to get up to speed on taking Javascript to the next level.
This is nice, and with a little work you could make a program that reads paths to Javascript files off the commend line and executes them. But to make things really interesting, you want to find a way to get Java data into your JavaScript and do something useful with the results, not just print it as a String.
BSF provides a way to inject objects into the scripting language’s environment, so you could use that facility. When writing automated unit tests for that particular project, I chose a simpler route: I serialized the data into JSON and used that to call a Javascript function directly via BSF:
manager.eval("javascript", "(java)", 1, 1,
"myJavascriptFunction(" + myJSONString + ");");
This is a
really bad idea if your JSON is handed you from an insecure source, such as a public web page calling you back via
XMLHttpRequest, but if you trust your source, this works wonderfully.
Now what do you do with the result? If you are generating something esoteric like a Javascript function, I have no idea. In my own case, I return all values as simple trees of Hashes (Javascript objects without any special methods) and Arrays. I convert those into Java trees of Maps and Arrays:
import org.mozilla.javascript.NativeArray;
import org.mozilla.javascript.ScriptableObject;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
// ...
static List unwrapNativeArray (final NativeArray na) {
return new ArrayList<Object> () {{
for (int i = 0; i < na.getLength(); ++i) {
add(unwrapNative(na.get(i, null)));
}
}};
}
static List unwrapPrototypeArray (final ScriptableObject sObj) {
return new ArrayList<Object> () {{
final List<Object> sObjIds = Arrays.asList(sObj.getAllIds());
for (int i = 0; sObjIds.contains(i); ++i) {
add(unwrapNative(sObj.get(i, null)));
}
}};
}
static Map unwrapObject (final ScriptableObject sObj) {
return new HashMap<String, Object> () {{
for (Object id: sObj.getAllIds()) {
put(id.toString(), unwrapNative(sObj.get(id.toString(), null)));
}
}};
}
static Object unwrapNative (final Object obj) {
if (obj instanceof NativeArray) {
return unwrapNativeArray((NativeArray) obj);
}
else if (obj instanceof ScriptableObject) {
final ScriptableObject sObj = (ScriptableObject) obj;
final List<Object> sObjIds = Arrays.asList(sObj.getAllIds());
if (sObjIds.contains("keys")) { // a prototype enumerable/hash
return unwrapObject(sObj);
}
else if (sObjIds.contains("flatten")) { // a prototype enumerable/array
return unwrapPrototypeArray(sObj);
}
else return unwrapObject(sObj);
}
else return obj;
}
Check your watch. Are you still under fifteen minutes? Great!
Labels: lispy, popular
Is raganwald leasing space in Blogger's walled garden?
I currently use
Blogger to compose this weblog and to handle comments. So yes, Google has a database somewhere with all of my posts and comments in it. But
raganwald is
not living inside a walled garden. Soon after I started writing, Joel Spolsky wrote to suggest that I not use the built-in hosting option but always host my own words on my own (as in leased from a hosting company, so I guess I mean
relatively more my own) server with my own domain. I took his advice.
So what happens with the weblog is this: when a new page is published or an old one updated, Blogger actually renders the page to HTML and uploads it to my server by FTP. If I make a change to the template such as adding a new link to the “popular” section, Blogger re-uploads every single page in the weblog. If you add a comment, Blogger adds it to the bottom and re-renders the page, then re-uploads it. Every time.
So while I use the Blogger
application to manage my weblog, I do not use Blogger to host my application in any meaningful way. If they shut down the service, I still have the pages. If I switch applications, I still have the pages. If you want to scrape data off my site, Blogger is powerless to block your web spider. If they change their business model and insist that every page carry their advertisements, I can GREP their ads out.
I apologize to everyone that finds the current application’s comment system inconvenient, and I will look into alternatives. But REST assured, this weblog supports the web’s basic open commons in a meaningful way. It is not leasing space in a walled garden.
I’m going back to work on
actual software now. I thank you for your patience with these posts pretentiously pontificating on the potential for premature pizzlement
1 of our precious web...
- Philip K. Dick, Autofac
Thanks for submitting my post to programming.reddit.com
…or
slashdot,
del.icio.us,
dzone, and all of the other “social bookmarking” sites. I really appreciate knowing somebody likes what I’ve written.
But you know, as much as
I love these places, I’m getting a little worried by where we’re headed. The thing is, these aggregation sites are taking the value that the web provides and concentrating it in proprietary databases. They operate on the Internet, but not on the web.
Let me give you an example. A popular blog post can generate hundreds of comments. When those comments are attached to the post, you can read them right on the post. Anybody finding the post finds the comments. That’s value added to the post. Search engines can index them. If there are links in the comments, search engines can make inferences about the relationship between pages and ideas.
Likewise, if you follow-up with a comment on
your blog, your words are forever available to anyone reading your ideas. Search engines can make inferences about the relationship between pages and ideas. Words and links on the web add value to the web.
But when the comments are in
programming.reddit.com instead of on the blog post, what happens? I hope that search engines are smart enough to index those comments as if they were attached to the post, but I’m not so sure they do. And what if the company owning the comments blocks search engines, or goes out of business? The value is lost forever.
The web is composed of pages with contextually relevant links between them. Social bookmarking applications subvert this basic structure. They are unravelling the web itself.
Those comments are on the Internet, but they aren’t on the web. The web is composed of pages with contextually relevant links between them. Social bookmarking applications subvert this basic structure. They are unravelling the web itself.
This isn’t just a problem with the value of comments and discussion being redirected off the web and into Internet applications. It’s a problem with the basic link structure of the web itself. As much as I love the convenience of del.icio.us (I have a special tag just for
links to fold into my weblog’s feed), it’s a culprit as well. Recently, I read
something very interesting by Carl Lewis. I posted the link to my weblog feed and to programming.reddit.com. Traffic followed for Carl. Great.


In Code and Other Laws of Cyberspace
, Lawrence Lessig argues that the the Internet is being steadily regulated and controlled by businesses and governments. Are we going to sit on the sidelines whilst others decide its fate? Or will we get involved and participate in the creation of the future?
This is a must-read for anyone passionate about the Internet, the web, and its role in society and business.
But you know what isn’t great? There’s no context for these links. There’s one-time traffic to Carl’s post, but although that happened on the Internet,
it didn’t happen on the web. There’s no permanent link to Carl’s post on my weblog. There’re no words, no matter how brief, explaining what I thought of the post and why I liked it. Sure, search engine providers can add a page’s number of del.icio.us links to their formula for calculating popularity.
But now, the smarts about what a page represents and the relationships between pages have gone from being embedded in the pages themselves to the tags that reside in a proprietary Internet application. Once again, the inherent smarts of the web are being leeched away.
The web has its faults. But it’s
open. Anyone can write a new search engine. Sure, the big “Do as we say when we say do no evil, not as we do” company has a tremendous competitive advantage in their massive S^Hk^Hy^Hn^He^Ht^H computing network. But in principle, all the data they’ve collected from pages and links can be collected by their competitors, or by you in your garage.
Not so with the tags in del.icio.us or the rankings in digg.com. All it takes is a EULA at the bottom of each page and they slam the door on us.
call to actionNaturally, keep using these applications. If they add value to your life, that’s great! But spare a thought for the web, for true freedom of information. There’s value there too. If you blog, don’t forget that old fashioned links, the kind with HTML anchors (not blogrolls or JavaScript or what-have-you) are still what the web is made of. And a link with even a sentence explaining why you’re linking is
even better.
And do you have something to say about a post? Why not put your comment right there on the post?
Or better still, why not publish your own blog post with your words and link back? I know all your buddies are on reddit or what-ever, but your words will be (relatively) immortal on the post.
It’s our web. Let’s keep it that way.
update: Thanks for some of the early feedback, I’ve tried to clarify something: I don’t want to hog your comments on my blog where I can moderate them or get Google-juice from your words or even sell books to people searching for your words. What I want is for your words and links to be freely available to everyone to read, search, index, and add value.
They’re your words, put them where you will (I obviously do). They’re your links, aggregate them where you will (again, I do). But words on the web are more open and more valuable than words in a social aggregation application. And links that make up the web are more open, more valuable thank links in a social bookmarking site. I just want to encourage us all to keep growing the web.
Update II: After thinking about it, I see the issue as being congruent to the
walled garden syndrome. It isn’t about taking content behind a log in or sign on, but it is about taking information that used to be open and distributed (like links and relationships) and putting it into proprietary Internet applications.
And it happens for the same reason: a true commons requires more work and has messier connectivity than a walled garden. When a city grows organically, it will always have icky bits and inconvenient bits. A “planned community” with a reasonably benevolent dictator will always offer a much more attractive vista, with more convenience and better looks. Naturally reddit has more convenient commenting. And also naturally, Condé Naste owns its database.
In Toronto, we have vibrant neighborhoods, and we have corporate spaces like shopping malls and nightclubs. Each have their place, but if I could only have one, I would have the public spaces. I don’t want to wake up one day and find that we can have any Internet we want, as long as it’s proprietary.
And update III:
Is raganwald leasing space in Blogger’s walled garden?Update IV:Joel Spolsky on why
comments are bad for blogs, especially anonymous comments. Worth reading. This does not affect my thinking about the importance of having web pages link to web pages rather than using bookmarking sites.
Off Topic: Cuba, ¡si!
My bother and I are going to Cuba August 3
rd to 11
th for a week of rock climbing, scuba diving, salsa, cigars, and hanging out. We will be staying informally, in the Cuban equivalents of Bed and Breakfasts.
Update:
Scuba diving off Isla la Juventud.
It's a great time to get cheap, last-minute flights down there. And there won’t be that many opportunities to visit before everything changes. For one thing, all those gorgeous old cars are going on eBay the very moment the trade embargo is lifted. An era is drawing to a close.
There is some of the world’s best climbing and
reef diving to be had in Cuba, unspoiled by hordes of tourists. For example, the diving close to the big resorts is middling, but if you travel off the beaten track to Maria La Gorda, the Bay of Pigs, or the Isle of Youth, all reports are that the diving is spectacular.
I can report from experience that
the climbing in Viñales is uncrowded and spectacular, with climbs from 5.6 to 5.13 on tufas, stalactites, and other features. It’s like climbing in Thailand, only you don’t have to fly twenty hours to get there and you are in the middle of coffee and cigar tobacco plantations.
If you have a fancy to try visiting and you want to hang out, please get in touch!
For me, visiting Cuba is absolutely not an endorsement of Castro’ policies, nor is it a criticism of the embargo and travel restrictions. If you are reading this and your country has laws or policies frowning upon visiting Cuba, please be very sure you are complying with your regulations before visiting.
It’s perfectly legal for Canadians, and it’s super easy for others to travel to Toronto or Mexico and fly direct from there, but the fact that it’s easy doesn’t make it right for you.
Can your type checking system do this?
I tend to work with dynamic languages. This is a personal preference at the moment. I have liked statically typed languages with powerful type checking systems in the past, and my mind is wide open to using them again in the future. Unfortunately, when you mention “static typing,” most people think you’re talking about Java. There is more to static typing than Java, a lot more. For example, Brian over at Enfranchised Mind
pointed out that languages like OCaml have a static type checking system powerful enough to enforce constraints like:
- All possible variants of a type must be dealt with—for example, that a list has no elements or at least one element must both be handled.
- The read_async function only takes file handles opened with the async flag set, while the close function can take file handles with the async flag set or not set.
- The buffer returned from read_async can not be accessed until the asynchronous read is completed.
- A Postgresql cursor can only be created and accessed with a transaction.
He’s absolutely right. Serious programming languages can do all of that.
Joel Spolsky once explained why he likes using Apps Hungarian naming conventions:
it helps avoid problems with various SQL and Javascript injection vulnerabilities. The idea is that you want to make it obvious when you’re violating certain constraints on the use of string values in your application. And Tom Moertel responded by showing
how Haskell’s static type checking can completely eliminate the problem. Once again, a powerful type checking system can be used to enforce an important constraint.
My own pet peeve problem is the dreaded
NullPointerException. Why, I whinge, must the compiler demand that I type boilerplate like:
final Map<String, String> dictionary = new HashMap<String, String> () {{
// ...
}}
But it will not do a simple little thing for me in return like check all of my code paths to ensure that I never pass a parameter that
might be null to a method that assumes the parameter
can never be null? Wouldn’t it be
nice if the compiler would deign to help me with these things?
What do all of these things have in common? They are
semantic constraints. Constraints on the meaning of a program. The kind of type checking you find in popular programming languages that is based on interfaces, on the physical type of your values and variables, that kind of type checking is a step up from syntax, but not much further. I would call it grammar checking, checking that your “sentences” have the right parts of speech in the right places. There is no attempt to check that your sentences are meaningful.
the value of semantic validationPowerful type systems (type systems like those found in Ocaml and Haskell are often called “expressive” type systems), can be put to work enforcing a degree of semantic correctness. (There are other ways to do it: there are static code analysis tools, for example you can buy tools that check C/C++ code for memory leaks and buffer overflow vulnerabilities. And if you use a language with
Design by Contract features, you have another tool in your toolbox.)

The Little MLer introduces ML (and Ocaml) through a series of entertaining and straightforward exercises leading up to the construction of the Y Combinator.
ML and OCaml introduce powerful strong typing and type inference. Both are great languages to learn: you will stretch your understanding of defining types and writing correct programs.
There is an argument that testing can replace type checking. Quite honestly, I believe that extensive automated testing almost completely replaces the need for “grammar checking” in the software I write.
Why? Because checking that you don’t send the wrong message to an object, or call a function with the wrong number of arguments is something that seems to get found when my test suite reaches a certain tiiping point in code path coverage. So far I have always found these things very quickly.
That being said,
Theory P warns me that there are some I haven’t found, of course, but I have some faith that the result of such an error will not be catastrophic. With strongly typed languages like Lisp, this kind of failure is gracefully handled. We aren’t talking about a buffer overflow that can turn exploited into a privilege escalation.
The nice thing about grammar errors is that you don’t have to try very hard to catch them. If you have reasonable code coverage, you’ll find them. So you concentrate on your test cases, and while you are testing your logic you will validate your grammar.
So that’s why dynamic languages in conjunction with automated test suites are working
for me. YMMV, so let’s not argue about whether it would work for you or not, because I want to move ahead and say that I don’t believe that extensive automated testing has quite the same value for semantic checking.
I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared with the conceptual errors in most systems.
Semantic errors, you see, are not covered “for free” while you exercise your program logic. If you want to rely on testing to find them, you must become very good at thinking up edge cases where things can go wrong. You must try situations where things fail and you get
nulls. You must try to create race conditions in concurrent code. You must try really bad data, like setting up cases where an employee’s hire date is after their termination date, or there is a cycle in the graph of reports and managers, or someone has
eight bosses
.
You have to try empty lists, lists with one item, all of the cases. Grammar checkers only check that you are passing a list, you need semantic checking to ensure you have coded for every case.
what to doNow, you do not get semantic checking for free if you start writing code in Ocaml or Haskell. Why? Because you will be a Real Programmer, someone who can write Fortran in any language.
A language that doesn’t affect the way you think about programming, is not worth knowing.
To take advantage of a language like Ocaml, you must actually think through the semantics of your program and think of ways to make the type system work for you. You must “think in types” instead of thinking in values. This is a valuable exercise, because it fundamentally changes the way you view programming an application.
In my experience, this thinking through the semantic constraints of your application may be the most valuable result of the exercise. You see this when people write a very database-centric application. They think about the entities and the relationships. They write constraints and triggers. There’s a great value in arguing all these cases out, in drawing the arrows on the white board and marking them with little question marks, plus signs, asterixes, and ones. It makes you think things through.
I was once asked in an interview to name a feature from C++ I wanted to use in Java. I didn’t name Templates (this was before generics), or Multiple Inheritance. I named the
const keyword. I am a huge fan of
const correctness, of using the C++ compiler to enforce this one little
semantic constraint. It’s an enormous burden, because you literally have to go through your entire program, usage by usage, to make sure you are fully const correct and have correctly declared everything. It would be far easier if C++ had const inference.
But const correctness is valuable precisely because it forces you to think very hard about a
semantic constraint: when should something change, and when should it be considered immutable? The act of bringing a code base into const correctness forces you to think about what the code
does to its data, not just whether it has “spelling errors.”
I can see the same value in using a language like Ocaml or Haskell to enforce semantic constraints. Sure, the compiler will catch errors for you. But the act of designing your application’s types to enforce your application’s constraints will be a powerful force driving you to deeper understanding of your domain.
If your current programming language can’t do this, I encourage you to study ML, Ocaml, Haskell, or one of the other fine languages with powerful typing systems.
Update (April, 2008): 7 Actually Useful Things You Didn’t Know Static Typing Could Do: An Introduction for the Dynamic Language Enthusiast
Ancient Wisdom
I don’t trust any “SDK” made by a company that won’t use it themselves…
This is why I’ve never tried to program for any of the many, many systems Microsoft has tried to foist off on us over the years (Direct-stuff, Active-thing, C-sharp, .Net, Live-whatever)—because they don’t fucking use their own stuff. They write demo apps in them, sure, and tell us that their frameworks are going to be the basis of the next generation of wonderful applications, but in the end Microsoft’s OSes and their Office apps are just a bazillion lines of old C code, and the programmers who got duped into using Microsoft’s new untested frameworks realize that, surprisingly, untested frameworks never work.
Wil Shipley is taking Apple to task for their (possibly temporary) advice that developers develop “iPhone Applications” in AJAX. Note the quotes: in the nineties, I would have used my fingers to make that annoying “I’m being sarcastic” gesture, because a Safari application is
not an iPhone application. Not that there’s anything wrong with web applications. Except there is something wrong with web applications on a device that willfully refuses to remember your identity.
Sorry, I’m rambling.
The point is, frameworks and libraries and toolkits and programming languages and SDKs are best when they are
extracted from actual development projects, rather than manufactured out of whole cloth from the imagination of someone who
thinks they know what will work and what will not.
This is no different than what works and doesn’t work for other kinds of software. The whole point of iterative development is to put working software in the hands of users so that they can guide its development and design from their experience, rather than from guesses and intuition. It’s an
empirical approach.
(The most glorious example of this principle is a
self-hosted programming language: when a language designer “eats her own dogfood” by using the language to build the language, the result is nearly always powerful and elegant. Decades after their invention, programmers still rave about Lisp’s
metacircular evaluator and Smalltalk’s UI framework, both of which are written entirely in themselves.)
Now, AJAX in a browser is not exactly a figment of some Apple engineer’s imagination. For a very popular class of application’s, it’s
the cat’s pajamas. And if you are building such an application, you probably want to make sure it works well on an iPhone.
But what if you are asking yourself,
Self, what tool should I use to make an application for iPhone with the same capabilities and look and feel of Apple’s own applications that ship with iPhone? I think the answer will be,
Do as Apple actually does, not as they say.
And likewise, when someone suggests an SDK or a framework or any other tool, and they tell me how gosh-darn wonderful it is, my first question will always be,
What are you using it for yourself?
It’s ancient wisdom, reeking of conservatism and cynicism. It doesn’t sound daring and adventurous. But boy, does it work.
Repost: The NewCo Business Idea
I wrote about
abandoning my attempts to write a new kind of sofware project management application. Here is an excerpt from the
Original Business Plan that I wrote in 2004 and very early 2005 (1.2MB PDF). Looking back on it, the fundamental premise was to encode expertise into the system in advance. I would later move to the idea of using collaborative filtering to make predictions about the outcome of software development projects.
Let me tell you a little story:
ThreadalyzerSome little time ago I was hired by Steve Rosenberg to build a product named Threadalyzer. A very bright product manager named Saeed Khan had thought up the name, and the feature list was compiled by the simple expedient of copying features from existing products.
Threadalyzer was part of a suite of products for analyzing the behaviour of running Java applications (along with Profiler, Memory Debugger, and Coverage). Although the suite was marketed as a collection of different products, in actuality Profiler, Memory Debugger, and Coverage all shared a lot of common code while Threadalyzer was built from the ground up with a different architecture.
The biggest difference was that the other three products collected information about a running Java application. The information was presented in a graph or table, and the programmer would use that information to figure out what was going on. For example, Profiler could tell you that your servlet spent 90% of its running time in methods belonging to the jdbc package.
Is that good? Bad? Only the programmer would know.
But Threadalyzer was, at its heart, an exercise in “Artificial Stupidity.” Threadalyzer collected data about the way threads interacted with each other, especially through synchronization and locks. But unlike its suite-mates, Threadalyzer then looked for specific patterns. And instead of dumping the data in your lap, Threadalyzer made a diagnosis.
For example, Threadalyzer maintained a
waits-for graph where every thread is a node and arcs connected threads that were blocked by locks to threads that held those same locks. When a cycle appeared in the graph, Threadalyzer would announce that the program contained a deadlock.
(This is not rocket science, unless you want to do this in real time by combination of hooks into the rather primitive Java VM of the day and by injecting byte codes into classes on the fly.)
Threadalyzer displaying a deadlock
I had the privilege of hiring a
very smart fellow whose Master’s Thesis in Distributed Computing was one of my reference papers for the design. He sifted through the possibilities and designed several other useful patterns. The smart fellow, a C++ wizard named John MacMillan, myself, and a promising Waterloo co-op named Jenny Lee then built Threadalyzer with a plug-in architecture.
Although (to my knowledge) the product was never “unbundled” in such a way that customers could install analyzers after the fact, we built it as an engine that hosted many arbitrary little pattern-matching applications.
NewCoShipping software on time was and is a very hard problem. Actually, there are
two hard problems involved. The first is knowing how to plan and
manage development. The second is convincing stake holders that your plan is optimal and that any interference on their part—be it feature creep, dictating overtime, advancing the ship date, whatever—will make things worse. Please don’t consider the second part as me just trying to make a funny to capture your interest.
It is a very hard problem in the real world.


In my twenty years of business experience,
Growing a Business
is absolutely the best book on founding and running a business organically that I have ever read. And I read a lot of books. “Growing a Business” is not about scoring business coups or raising money. It is not about sales tactics or innovation. It is about growing a business step by step, customer by customer. It is about expanding at a sane rate and getting rich the old-fashioned way: one satisfied customer at a time.
Growing a Business
is a must-read for anyone who wants to build a business rather than “do a deal.”
One of the things people like about project management applications and spreadsheets and just about anything that produces professional looking graphics for presentations and reports is that it makes your plan seem solid and unassailable.
If a stakeholder walks into your office and sees handwritten
three by five cards, she may be tempted to haggle.
One of the things I learned right away is that project management software is useless for actually planning projects, and if your boss is wily enough to know that, she will
still haggle with you: she will ask tough questions like, “How do you know that you will need that much time for QA? Won’t the programmers have written bug-free code?”, or perhaps “Since we are instituting a brand new change control process, and therefore there will be very few new requirements, can’t we cut the development time in Phase II in half?”
You can’t defend a GANTT chart if your interlocutor has ever shipped a real, live working piece of software. She knows that what the GANTT chart makes tangible and concrete—task dependencies and resource allocations—have almost nothing to do with the final ship date.
So I dreamed of a piece of software that would back up the kinds of things I actually say when planning a project or defending a plan, like
Well, we can produce code that requires less QA, but we will need to allocate time for that unit testing framework and continuous integration sever you turned down.
Or,
Okay, you’re saying we should cut the estimate for requirements churn by 75%. Okay, we can try that, but if we get the usual number of requirements changes in Phase II, we’ll know that the fault was with the estimate, not with the team.
Or,
Overtime? No problem, but the numbers show that we get almost as much work done in seventy hours a week than we do in fifty, and so on.
TriangulationOr if that’s too much of a stretch, let me ask you this question. The very first thing you learn about shipping software is the existence of the
features–quality–time triangle (another variation on the same idea is the
features, money, time triangle).
Okay, where is the triangle in a GANTT chart? How do you play what–if with the triangle?
Tools for managing projects ought to directly manipulate the ways we think about managing projects. I wanted such a tool.
The HammerWhen you’re holding a hammer, every problem looks like a nail. And having built Threadalyzer, my thought was,
why can’t we have a project management application that has little analyzers that look at a plan and provide warnings of problems with the plan? And of course, we would ship analyzers that encapsulated project management expertise as patterns in advance.
It would be much, much later that I would realize the fundamental error in my thinking.
Certification? Bring It On!
Not too far in the distant past, I was persuaded to give my résumé to a recruiter. He was trying to place a development manager for a growing company, and they wanted heavy Agile experience, deep management experience, and someone with some technical chops. Well, I figured that twenty-plus year of experience, with something like eight years of legitimate Agile, leading a team of twenty-plus, producing a product that won several Jolt awards and a JDJ Editor’s Choice award… I thought I was a lock for an interview.
But an email came right back:
The client is wondering where you got your degree. Twenty years of experience, and they want to know how I spent my time in the mid-eighties.
This got me thinking about certification. It’s another long-running debate. And something funny has happened to me.
I’ve switched sides. I’m actually in favour of certifying software developers. Yes, I am in favour of disqualifying intelligent programmers from professional employment if they do not possess a little piece of paper from a certifying agency.
Deep breath. Wait for the room to stop spinning. Or is my head spinning around on my shoulders (
heh
)?
the catchLike everyone else in favour of certification, I have my own ides about what skills and knowledge you need to demonstrate to get your certification. Unlike everyone else, I think I would fail my own certification if I didn’t do a whole lot of studying. That’s because I think our industry is undemanding, very undemanding. I know a few people who would pass my certification without studying. But only a few.
Before I tell you what’s on the final examination, let’s talk about what
isn’t:
- Object oriented programming: not on the exam. Use OO on a project, don’t use OO on a project, I don’t care. Throw in dependency injection, MVC, everything to do with industry standard architecture. I still don’t care, you can get certified without it.
- Functional programming, metaprogramming, macros: not on the exam. Be an academic, be a language weenie. Or not. I don’t care.
- Static typing, type inference, annotations, category theory: not on the exam. Save it for your blog post, I don’t care.
- Knowing who Alan Kay, John McCarthy, or Ted Nelson are: not on the exam. I think it’s great to know the history of our industry, but I don’t care.
- Design Patterns
: See OO above. I really don’t care. Write spaghetti code. Be Mel. I don’t care.
- Agile Development, Waterfall, GTD, or anything else to do with Getting Things Done: I really don’t care. I will certify you whether you can execute or not.
- Deep knowledge of programming language syntax or semantics: I’ll certify PHP programmers. I’ll certify Java programmers that don’t understand generics. I don’t care if you know what a closure is. I don’t care if you believe that recursion is retrograde.
By now, you are thinking, “Raganwald, this certification is worthless. You are excluding just about everything we know about writing great software. What’s the point?”
Let me explain. My certification does not say you are any good at coding software. Let me repeat.
My certification does not say that you are any good at coding software. I’ll let the marketplace decide. I am not telling businesses, “Hire certified programmers, they are great coders.”
Why should I? Business is perfectly happy to hire programmers with Comp. Sci. degrees, and there seems to be little or no evidence that a Comp. Sci. degree says anything about your ability do deliver working software. So why should my certification raise that bar?
what’s on the examNow let’s talk about what
is on the exam. Just one subject, but the exam goes into excruciating detail about that subject. If you don’t know this subject cold, I am
not going to certify you. Period. No debate, no negotiation, no “equivalent experience.”
The one subject?
Testing and Quality Control. That’s right. All I care about is that if you are asked to make bulletproof software, you know how. I’m going to ask you about:
- Continuous integration.
- Black box testing.
- White box testing.
- Design for testing.
- Probabilistic testing.
- Testing tools.
- Testing metrics.
- Testing methodologies.
And more:
- Security concepts.
- Preventing vulnerabilities.
- Privacy and data management concepts.
- Encryption and verification concepts.
And a whole lot more on top of that. There is room for debate about whether to have separate testers or whether programmers should test themselves. You are not getting certified with me unless you know how to do it both ways, and can write a comprehensible essay describing the relative advantages and disadvantages of both. I am not going to require you to know the latest programming frameworks, but you are not getting certified unless you demonstrate up-to-date knowledge of the latest continuous integration tools.


Test Driven Development: By Example
is THE book that ignited a revolution in software development practices. Whether you are developing in an Agile environment or working from a telephone book specification in a Waterfall project, Test-Driven Development will show you how to write automated tests that work to actually shorten your development time and clarify your code.
And best of all, this book uses actual projects as an example. This is not an exercise in theory, this is a practical tome full of practical advice. You’re already familiar with the concepts, reading this book will dive into the details that will make your coding more effective.
I don’t care if you know how to write a great architecture document. But I will fail you if you can’t write a good code review. The same goes for everything else. I will not demand that you do it a specific way, but you will prove that you have state-of-the art knowledge of how to ensure that software is solid and does what you expect it to do, no more, no less.
And you know what? After you get your piece of paper, you’ll need to work for at least a year under the supervision of a certified leader to get your upgraded “practitioner” certification. I want you to practise continuous education.
Does this mean that I am going to certify all of the QA Analysts in the industry while barring the programmers from work? Well, let me ask you: do most, a few, or any of the QA Analysts you know have a deep knowledge of software quality and methodologies? Can they write an essay describing the cost of bug fixing comparing early vs. late detection? Can they talk about various unit testing tools? Can they measure code coverage? Can they look at a piece of code with 93% coverage and tell whether the missing 7% has one or more crucial cases missing?
If so, I want to certify them. If not, they need to hit the books with me.
where did I get this crazy idea?In our industry we have wasted millions of person-years debating the relationship between software development and architecture/engineering. Last weekend I went to see Pixar’s Ratatouille with my son. And it struck me: we should be like chefs.
Do I demand that the chef in a restaurant use a certain kind of stove? Cook a certain kind of food? Manage her kitchen a certain way?
Non! The marketplace decides all these things. And the marketplace works for this. What do we demand? What do we
require of restaurants?
Safety. We demand that if they serve the public, that they have certain fire safety standards. That they have certain food cleanliness standards. That they know enough about food not to poison us by accident. Of course, a cook learns to cook when getting their designation. But the thing we really
demand of them is that they keep us safe!
If we order Chicken, we do not want the Fish to come out and put us into Anaphylactic Shock. Cooks know that mixing up Chicken and Fish is fatal, while mixing up Basil and Oregano is not. So they have a different protocol for handing the two kinds of food. They keep us safe.
And if I am placed in charge of certification one day, that’s what I will demand.
Keep us safe. Don’t leave
back doors and XSS vulnerabilities in your code. Don’t store our passwords in the database. Don’t deliver code that is full of undiscovered bugs.
If someone can be relied upon to write software that is safe, I will not dictate how they do it, or how long it takes to do it. The marketplace can decide whether they are employable, much as a restaurant can decide whether to hire someone whose food is bland and unappetizing.
I am not telling businesses that they can’t ship software full of bugs. You have Product Managers, they can decide whether it is more important to build new features or fix the old ones. Microsoft, do your thing! But business can’t make those decisions unless they have an accurate view of what the software actually does, of which parts are solid and which are brittle.
And I am not telling managers how to run projects. But I do expect that a certified manager understands the trade-offs when she chooses BDUF over Agile,
Theory D over Theory P. She can do as she please, provided her eyes are wide open to the consequences.
Well, that’s my thought about certification. I’m all for it, don’t let anyone say otherwise. And like everyone else, I want it to reflect what my experience tells me is important about software development in the commercial environment, namely safety and a clear view of what works and what doesn’t.
This is obviously a pipe dream, the product of Dark Horse Café’s “Ruby’s Pride” French Press coffee. But what do
you think? Would such a certification be useful?
Labels: popular