Ockham's razor as it applies to the big rewrite
Oh dear, the latest
nine days wonder seems shocking on the face of it. To quote the protagonist in the tale:
Back in January 2005, I announced on the O’Reilly blog that I was going to completely scrap over 100,000 lines of messy PHP code in my existing CD Baby (cdbaby.com) website, and rewrite the entire thing in Rails, from scratch. I hired one of the best Rails programmers in the world (Jeremy Kemper aka bitsweat), and we set off on this huge task with intensity…
Two years (!) later, after various setbacks, we were less than halfway done… I said fuckit, and we abandoned the Rails rewrite… Then in a mere TWO MONTHS, by myself, not even telling anyone I was doing this, using nothing but vi, and no frameworks, I rewrote CD Baby from scratch in PHP. Done! Launched! And it works amazingly well.
There has been a running flamestorm over this, mostly centered on whether Rails is or isn’t the perfect framework for web applications. In other words, there is a prevailing sentiment that this rewrite failed for technical reasons.
I have
my own experience with project failure, and I can’t blame any of it for technical problems. This is because, in my experience, the technical problems in projects are not root causes, they are
symptoms of people and management problems.
Amazingly, Chad Fowler identified
six reasons that big rewrites of production systems fail. Given that opinions are like noses (everybody has one), what’s so amazing about Chad’s list? What’s amazing is that Chad wrote his list a year ago, and yet he thoroughly nailed Derek’s experience.
Expecting a plug for a book about the latest miracle cure for project failure blues? Nope, sometimes the technical whizzies blind us to basic, fundamental wisdom that is as true today as it was twenty or even forty years ago. Don’t believe the hype.
For a solid grounding on how to successfully develop software, start with The Mythical Man-Month: Essays on Software Engineering. It is one of the most important books ever written about developing software, from the small to the large. Read the book that spawned the expression, “There is no silver bullet.”
I’ll bring up his first reason,
Software as Spec. Re-read the condensed story above. Note, with horror, that a Rails expert took two years to fail what Derek accomplished in just two months. Well, if Jeremy couldn’t do it, obviously Rails is seriously broken, a complete toy. But this is like a sleight-of-hand trick. The paragraphs tell two stories, and we are so focused on one, “Two years for Rails, two months for PHP,” that we ignore the prestidigitator’s other story:
Two years for Jeremy, two months for Derek.
This is a complex, production system. When writing any complex, production system, what is the number one biggest risk? No, not technical risk.
Requirements risk. Jeremy tried to create a copy of an existing, messy production system. Sure, he had Derek pulling the strings and calling the shots. But this is no guarantee of success, especially in an environment with heavy integration with other messy, chaotic systems.
Communicating requirements is hard enough in a green field situation. It’s nearly impossible when dealing with a real, production piece of code. Jeremy had almost no chance if his goal was to duplicate the original system’s functionality based on Derek’s direction and even assistance.
Derek, on the other hand, wrote the original system. He knows it intimately. If there is some weird, bogus code in one corner, Derek remembers why it’s there, what obscure bug it fixes, or perhaps what bug in some other piece of software’s code it mediates. If something doesn’t need to be there any more, Derek knows it can be dropped and he knows why it ought to go.
Of course Derek succeeded where Jeremy failed. Having Jeremy performing the rewrite with Derek’s assistance created an insurmountable barrier for understanding. Even with Derek trying to assist, the very fact that they were rewriting in Rails created a double whammy: Derek knew the system but had to try to explain its subtleties to Jeremy. Jeremy knew Rails but had to try to explain its subtleties to Derek.
Derek’s two-month rewrite with the hindsight of implementing the system in the first place and then living through the first rewrite failure is easy to understand: one person knew both the system and the implementation technology intimately. No communication barrier. No requirements risk.
If there is one lesson to draw here it is that “There is always more than one way to look at any issue.”
simplicitySo, was the problem choosing Rails? Or choosing to bring in an outsider to implement a technology the insider didn’t know intimately? This is a
false dichotomy, of course. Failed projects may have many problems.
But when trying to simplify the situation and derive useful conclusions, my heuristic is to look for the simplest explanation, the one with the fewest moving parts. The technology explanation is very specific and with many dangling caveats. It only applies to projects involving rewrites from PHP to Rails, and even then only those projects that are messy and have legacy data models that aren’t particularly CRUD-dy.
The people explanation is much simpler. It explains what we observed here as well as what we observed in many other projects, and it does so with very few caveats and conditions.
My personal heuristic is to go with the simpler explanation that applies to the widest possible set of observations in the wild.
As several people have pointed out here and
elsewhere, we are working with anecdotal evidence. And the plural of anecdote is
not data. But we must soldier on in our industry. If we waited until we could rigorously “prove” anything about software development before forming conclusions, we wouldn’t have budged from toggling switches on front panels.