raganwald
Tuesday, September 21, 2004
  Writing Solid Corruption-Free Code
I recently heard of some programmers struggling with mysterious data corruption issues with their application. I was struck by their similarity to something I'd seen almost a decade ago. Back then, I was building database-backed CRM applications.

Plan "A" for such applications is simple: always write good data. Plan "A" lasts about a month into any project. Once change requests a/k/a maintenance and upgrades appear, it's easy to introduce new code that writes something the old code doesn't expect, and you have corruption.

Plan "B" usually kicks in at that point. You introduce constraints at the row level, using as many tricks as your database vendor provides (triggers, &tc.) to validate data before it's written. Plan "B" lasts for two more months before the sheer complexity of your object relations weighs in. Then you find out that saving records now takes forever, and it's hard to maintain your application when a lot of the validity logic is hiding in your database.

Plan "C" now takes over. You remove most of the slowest and most complex validity checks and move them into a batch process. You add some support for rollbacks, such as a "valid" flag. You run the validity checking on a nightly basis and as part of major data imports and updates. The validity checking is now all in once place, so it serves as documentation.

A funny thing now happens. As you develop new code, you find that you can consult the validity routines to understand what consitutes a valid state for the database. If you need to add new modalities, you can add new validity routines. It's better than a UML diagram because it's executable code.

Debugging is now a cinch. Validate the database. Run some tests. Validate again. If the second validation fails, you have bugs in your code.

Two years ago I introduced this idea to a project. I asked the project's database analyst to write a validation routine for a complex application. He replied that he couldn't, the application was too complex for that. His idea was that we'd follow his meticulous plan and everything would just happen to work.

Red flag! If the DBA can't write the routine, then there's zero chance the final application will work. As it happened, it was impossible to write new code for that application. There was a complex, table-driven workflow infrastructure and nobody could ever construct a baseline set of rules and states to use for testing code. I heard that they eventually junked it in favour of an off-the-shelf solution.

So now I'm going back to the well when advising my friends.

Their application serializes a dense object graph out into binary files. From time to time it seems to do unexpected and unwanted things. Debugging seems to be 50% Voodoo and 50% thought experiments.

I'm urging them to stop thrashing and start writing some external validation routines. It hurts to do that, but I'm certain that they'll reap serious dividend both in solving their current issues and when writing new code in the future. If they have a validation routine, one of the great things they can do is set upa debug mode that validates every single write.

If nothing else, I'll wager anyone dinner for two with a bottle of wine (I'm serious about this) that the very act of wading through their data format and deciding how to write a utility that can validate their data will cause them to kick themselves at least once and say "Aha! I see a problem!"
 

Comments on “Writing Solid Corruption-Free Code:
Good idea. In a similar vein, I've been toying with creating some validation routines to ensure that our users create data correctly. We're customizing a (big) proprietary program and we've toyed with the idea of restricting users' input to valid values, but we're afraid of over-constraining, and it would be a lot of work (given the software's architecture). Instead, I've proposed that we have a daily script that searches through the database for errors we've seen in the past and emails us (and perhaps them) with any problems. We can add search criteria as we run into new user blunders. It will also allow us to give quick feedback when they do something wrong, which will help to break some of their bad habits.
 




<< Home
Reg Braithwaite


Recent Writing
Homoiconic Technical Writing / raganwald.posterous.com

Books
What I‘ve Learned From Failure / Kestrels, Quirky Birds, and Hopeless Egocentricity

Share
rewrite_rails / andand / unfold.rb / string_to_proc.rb / dsl_and_let.rb / comprehension.rb / lazy_lists.rb

Beauty
IS-STRICTLY-EQUIVALENT-TO-A / Spaghetti-Western Coding / Golf is a good program spoiled / Programming conventions as signals / Not all functions should be object methods

The Not So Big Software Design / Writing programs for people to read / Why Why Functional Programming Matters Matters / But Y would I want to do a thing like this?

Work
The single most important thing you must do to improve your programming career / The Naïve Approach to Hiring People / No Disrespect / Take control of your interview / Three tips for getting a job through a recruiter / My favourite interview question

Management
Exception Handling in Software Development / What if powerful languages and idioms only work for small teams? / Bricks / Which theory fits the evidence? / Still failing, still learning / What I’ve learned from failure

Notation
The unary ampersand in Ruby / (1..100).inject(&:+) / The challenge of teaching yourself a programming language / The significance of the meta-circular interpreter / Block-Structured Javascript / Haskell, Ruby and Infinity / Closures and Higher-Order Functions

Opinion
Why Apple is more expensive than Amazon / Why we are the biggest obstacles to our own growth / Is software the documentation of business process mistakes? / We have lost control of the apparatus / What I’ve Learned From Sales I, II, III

Whimsey
The Narcissism of Small Code Differences / Billy Martin’s Technique for Managing his Manager / Three stories about The Tao / Programming Language Stories / Why You Need a Degree to Work For BigCo

History
06/04 / 07/04 / 08/04 / 09/04 / 10/04 / 11/04 / 12/04 / 01/05 / 02/05 / 03/05 / 04/05 / 06/05 / 07/05 / 08/05 / 09/05 / 10/05 / 11/05 / 01/06 / 02/06 / 03/06 / 04/06 / 05/06 / 06/06 / 07/06 / 08/06 / 09/06 / 10/06 / 11/06 / 12/06 / 01/07 / 02/07 / 03/07 / 04/07 / 05/07 / 06/07 / 07/07 / 08/07 / 09/07 / 10/07 / 11/07 / 12/07 / 01/08 / 02/08 / 03/08 / 04/08 / 05/08 / 06/08 / 07/08 /