Thus Spake Joel

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Monday, December 03, 2007

What you’ll see is that the hard-core geeks tend to give up on all kinds of useful measures of quality, and basically they get left with the only one they can prove mechanically, which is, does the program behave according to specification. And so we get a very narrow, geeky definition of quality: how closely does the program correspond to the spec. Does it produce the defined outputs given the defined inputs.

The problem, here, is very fundamental. In order to mechanically prove that a program corresponds to some spec, the spec itself needs to be extremely detailed. In fact the spec has to define everything about the program, otherwise, nothing can be proven automatically and mechanically. Now, if the spec does define everything about how the program is going to behave, then, lo and behold, it contains all the information necessary to generate the program! And now certain geeks go off to a very dark place where they start thinking about automatically compiling specs into programs, and they start to think that they’ve just invented a way to program computers without programming.

Now, this is the software engineering equivalent of a perpetual motion machine. It’s one of those things that crackpots keep trying to do, no matter how much you tell them it could never work. If the spec defines precisely what a program will do, with enough detail that it can be used to generate the program itself, this just begs the question: how do you write the spec? Such a complete spec is just as hard to write as the underlying computer program, because just as many details have to be answered by spec writer as the programmer. To use terminology from information theory: the spec needs just as many bits of Shannon entropy as the computer program itself would have. Each bit of entropy is a decision taken by the spec-writer or the programmer.

So, the bottom line is that if there really were a mechanical way to prove things about the correctness of a program, all you’d be able to prove is whether that program is identical to some other program that must contain the same amount of entropy as the first program, otherwise some of the behaviors are going to be undefined, and thus unproven. So now the spec writing is just as hard as writing a program, and all you’ve done is moved one problem from over here to over there, and accomplished nothing whatsoever.

—Joel Spolsky, Talk at Yale: Part 1

¶ 3:25 PM

Comments on “Thus Spake Joel”:

And yet most development requires a fairly detailed documentation. Specifically upping the detail of the spec documentation for the sake of generating the program is most likely a productivity black hole, but using the spec you must create anyway to generate a basic framework is most definitely not.

# posted by

huh? : 9:32 PM

Yes, huh?, it’s important to keep things in perspective. Proving a program satisfies an incredibly rigorous specification doesn’t mean you’ve done less work (you may have doen twice as much work as just writing a program).

But less detailed specs still provide value. As do automated tests. As does static typing.

# posted by

Reginald Braithwaite : 12:37 AM

Definitely in agreement there. Especially when many (especially government customers) want that incredibly detailed spec upfront, and it all changes into some unrecognizable form during the course of development as most projects are wont to do.

Why no! I'm not frustrated at all... why do you ask? ;)

# posted by

huh? : 4:03 AM

I agree with the general point, but he goes too far - it's easier to rigorously specify a fast 3-SAT solver than to implement one, and you can't automatically derive one from the other (in either direction, cf halting problem).

Of course in usual business practice the algorithms are simple compared to the behaviour. The 'hard-core geeks' aren't always most interested in the usual business practice!

# posted by

Sam McCall : 4:38 AM

Two points: one, Joel is describing formal methods and one of their main accepted criticisms, i.e. that the formal specification starts to resemble programming and all you've achieved is moving the problem from the domain of a programming language to the domain of the specification language.

Two, in many cases the specification language can be tailored for the problem domain and this is exactly what DSLs do. So by moving the problem from the domain of the general purpose programming language to the domain of a problem-specific DSL it is easier to write the code. Using a DSL will not magically avoid all bugs, but it will make you more productive and it will hopefully prevent you hanging yourself on all the rope that a general purpose programming language would give you.

-- Liwp

# posted by

Liwp : 8:14 AM

Joel is completely wrong. His fundamental error is in his "fundamental problem": he says that the spec needs to be extremely detailed and "define everything about the program". He has the wrong idea of what specification means. It is in fact perfectly possible to prove useful and interesting things about concrete implementations on the basis of abstract specifications.

I think the main notion he's missing is that of refinement: a spec P refines another spec Q if everything true of Q is true of P. Then Q is your abstract spec, and P is a "more concrete" spec. If you've proven that Q is, say, deadlock free, and you've proven that P is a refinement of Q, then you know that P is deadlock free also.

The idea is to produce a chain of specs, starting with a very loose and abstract (explicitly _not_ detailed) specification of what is required, which is then refined in stages until eventually you have an implementation. At some time you move from the domain of things you can't implement (eg a spec involving "the real numbers") to the domain of things you _can_ implement (eg a spec involving some computable approximation of the real numbers).

This is, admittedly, still work in progress - our ability to produce and work with formal specifications is underdeveloped compared to our ability to produce and work with executable programs (and we all know how hard this is!), but it's coming along.

If you want real world examples of specification languages which include these notions of refinement and are used in industrial context, look no further than the process algebra CSP, and the algebraic specification language CASL. CSP has a long and well documented history of use in industry (at least those corners that care enough, and can pay enough, to do this). CASL is less widely used than CSP (and much younger), but is more completely formally specified, and is pushing the boundaries of tool support for formal specification languages via its toolset Hets, which allows specification at various levels of abstraction, using heterogeneous combinations of specification languages (ie mix and match), with solid integration with automatic and assisted theorem provers.

Disclaimer: I'm the maintainer of part of hets - the CSP (or more properly, CSP-CASL) part. :-)

I suspect that Joe misses all this because he's a pragmatic guy so he doesn't care what we academics are up to and so (like most of the software world) his idea of specification is UML and natural language. There's more to specification than that, however - the trick is to formalise your specification tool, ie to use a formal specification language. Then interesting things become possible.

# posted by

gimbo : 8:32 AM

Having been interested in formal methods for a long time, and being a developer of commercial software, I'd like to address the fans of formal methods who take exception to Joel's point.

I would suggest you have to understand it in the context of his business: commercial software development for business customers. In this environment, the focus is on solving customers' problems as quickly and cheaply as possible, and "correctness" is a subjective, not an objective standard.

If he were a contractor for, say, government or large institutions, where accountability and process are valued more highly than turnaround or "warm fuzzies", formal specifications with refinement might be more appealing.

I hope that research continues into formal methods, but I think it would be a mistake to assume that this approach will ever appeal to commercial developers in a business marketplace.

# posted by

Marc Hamann : 9:03 AM

this solution tends to infinity, if the spec has to be very detailed as the program, so where is the spec to prove that you spec does what is suppose to do, and this will go, and go, and go on....

# posted by

mikerod : 2:48 PM

The problem here is one of a mathematician talking about business.

A business spec is simple: "Do what the business wants you to do." That specification has nothing to do with algorithms or code, and only even meets business processes in that dark world of "business requirements documents" and other unsavory monographs.

For every application I've ever worked on, things have fallen into two camps: either 1) no formal specification actually existed, or 2) the formal specification that existed was actually less useful as it became more specific.

#1 is the tact taken by Agile development, who defines their specifications in terms of "customer sign-off": "Does it meet spec?" becomes "Does the customer say it's done?"

The #2 case occurs when business people try to code in English, usually because they've been verbally abused by developers for not giving them clear enough specifications. Unfortunately, English is a horrible programming language, and the pidgin, bullet-point-ridden dialect of English that makes it into those requirement documents is even worse.

Ultimately, the real requirements for a program look nothing like formal specs for the code. The requirements have to do with budget, with enabling the business to do more work for less money, and with company politics. Algorithms are at least two or three steps removed from any kind of real program requirement: the important part is delivering a solution, not the code.

For more on this, also see my blog post on the second derivative of functionality and the passing discussion of functionality entropy that I have in there.

# posted by

Robert Fischer : 12:32 AM

I do not agree that detailed specs are a part of a "perpetual motion machine" (Joel), nor part of a "solution tend[ing] to infinity" (mikerod). Detailed specs, whether on paper, as test suite, or as an independent program, are a necessary dual of the main program to verify correctness. This double-development mimics (or is a result of) pair programming found in XP. There is the added work to compare the program to the spec for sure. The total cost is certainly over double that of simple development, but that is well short of infinity.

Of course, acquiring the proper spec is another matter entirely.

# posted by

Kyle Lahnakoski : 1:07 PM

Joel has a point but goes much too far.

Quick example: It is easier to specify that a list in sequence alphabetically than to write a good sorting program.

Another quickie: How do you have somebody come in and work on testing if there is no spec? They can verify it doesn't crash. But anything more is just guesswork. You can ask them to look at the code and see if the program does what it is coded to do, but that's just testing the compiler.

You don't always need to write a big detailed document to write an app, but when the team gets large and the system itself is complex, you need written specs.

When everybody on the team was involved in the conception of the product, they're more likely to know what to do.

When you hire an outside team to write software, how do you tell them what to do? Is it all verbal?

Yes, it's hard to write a spec, that's why in-house software teams exist. They eventually get to understand implicitly what is needed. Coordination and communications is easier.

The agility folks argue for working with the customers during design. The customers are often too busy to spend time reviewing half-built software.

Writing documents can be a black hole for productivity. It can become an obsession with "Software Process". We all notice that the CMM from SEI has not caught on like wildfire. But Joel goes too far.

# posted by

InternetFred : 6:00 PM

> Another quickie: How do you have
> somebody come in and work on
> testing if there is no spec?
>
Give them exemplars of the work to do which exercises the features that was just built. We call these "stories".

> You don't always need to write a
> big detailed document to write an
> app, but when the team gets large
> and the system itself is complex,
> you need written specs.
>
If your spec is too big to fit on a note card, it's too big.

I think there's a difference here between "specs" (which are functionality-based) and "tests" (which are code-based). A test verifies that a piece of code works as expected. The "specs" that I'm talking about define the work that the application is supposed to do -- the problem that it is supposed to be solving. If the developers aren't working to solve a problem -- if they're working just to satisfy a set of check-box requirements -- then you're heading into trouble in a hurry.

> When everybody on the team was
> involved in the conception of
> the product, they're more likely
> to know what to do.
>
> When you hire an outside team
> to write software, how do you
> tell them what to do? Is it
> all verbal?
>
Mostly -- it's also learning-through-example. First, pair them up with someone who groks the project until they get a strong handle. When they move out on their own, if they don't understand why some current piece of software works the way it does, or if they aren't sure how to do a piece of software, then. And if they code it wrong one time, then the customer will catch it, the team will fix it, and they'll know not to do it again.

> Yes, it's hard to write a spec,
> that's why in-house software
> teams exist. They eventually
> get to understand implicitly
> what is needed. Coordination
> and communications is easier.
>
And I'll take co-ordination and communication over specs any day.

# posted by

Robert Fischer : 8:48 AM

Let's rewrite this...

"So now the spec writing is just as hard as writing a program, and all you’ve done is moved one problem from over here to over there, and accomplished nothing whatsoever."

...as this, and pretend it were written a few decades ago:

"So now writing Fortran code is just as hard as writing assembly code, and all you've done is moved one problem from over here to over there, and accomplished nothing whatsoever."

...or take Python vs. C as your examples.

Yes, some formal methods are pie in the sky, and you'll never have a "one size fits all" specification language that eliminates programming. But you can invent new abstractions to be used as more compact forms of source code that are more closely related to the real-world problem at hand.

The key thing is to consider more and more "development artifacts" as source code. That is, instead of using a white board and then throwing away your drawings, or having an e-mail exchange and then forgetting your conclusions, you associate real pieces of "source code" with each of those. You bridge the gap between code and ideas.

At the end of the day, you're still only attacking the "accidental" problem of programming; as Brooks famously pointed out years ago the "essential" problem remains.

For some practical ideas in this space, see:

1. EMF for a practical model-driven development technology, specifically for data modeling;

2. Macro programming in Lisp, embedded DSLs with Ruby, and even "Language-oriented programming" (each about raising abstraction levels in source code);

3. Test-driven development using xUnit or TestNG, automated acceptance testing with FitNesse, etc. Test cases often serve as very compact representations of a requirement, and have the great side-benefit that they can run each time your code builds to make sure you're not off track.

There are others...

It's getting better every day. Developers are incorporating more and more things as "source code", and the cumulative effect is higher-quality software. But there is no magic bullet!

Sometimes, it pays not just to start hacking away and building up a piece of software in the language of your day. It may pay, in many situations, to start by building the tools you need to build your software effectively. This might be a DSL, a rigorous way of collecting test cases, etc. With that little up-front investment, you can end up with a vastly better (and more maintainable) product later on. See e.g. Paul Graham's comments about his use of Lisp Macros in Viaweb in "Beating the Averages", or his book "On Lisp." In the latter, he suggests that Lisp programmers don't build their programs "in Lisp", but they build them "On Lisp", in other words, they create a DSL for the problem they are solving using Lisp. Ideas like this are not limited to esoteric languages, and personally I wish I saw more "mainstream" developers taking them seriously.

# posted by

Andrew Montalenti : 3:49 PM

Andrew:

I like your argument about abstractions. You are preaching to the Choirmaster.

However, I am scratching my head about something. I understand Joel to be talking about a "specification" as something that unambiguously defines the behaviour of the program but is not the program.

Therefore, in his world, you write a specification and you also write a program, and you use some means of verifying that the program and specification match.

With what you are describing—DSLs, abstractions, FORTRAN—it sounds like you write the specification and stop right there. You don’t also write a program.

In which case, you do not do twice as much work, agreed.

But if I understand you correctly, you also do not end up with a program that meet some specification, you end up with a program that in some way is also a specification.

Is that what you are saying? If so, I am in your corner, however it does not really invalidate Joel’s point, rather it is a different strategy for producing software.

Remember, the comparison to FORTRAN vs. Assembler ought to be "If FORTRAN is powerful enough to unambiguously specify the behaviour of an Assembler program, then we ought to throw the Assembler away and just write FORTRAN, not try to write both side by side."

# posted by

Reginald Braithwaite : 3:59 PM

To validate the design you needn't go to the same depth of detail. Before they spend time building a house, architects usually draw up 'blueprints' that a) take way less time and b) reduce the complexity (from 3D to 2D), but still provide enough information to validate it.

We generally express the solution at least three times.

Paul.

# posted by

Paul W. Homer : 5:07 PM

Spolsky wrote, "Now, if the spec does define everything about how the program is going to behave, then, lo and behold, it contains all the information necessary to generate the program!"

Other pointy-headed people here have commented but, frankly, I do not think it has been stated clearly enough: formal specifications are not executable in general. Example: Halting Problem. The point of refinement vis-a-vis software development is to get from an arbitrary specification to an executable specification, a.k.a. an algorithm.

Some specifications don't define algorithms; some do. Some specifications define a collection of algorithms and need to be refined to get down to a single algorithm which can be extracted from the description. (A program is not a collection of algorithms; it is a single algorithm.)

Any programmer who doesn't understand all this needs to go back to college, or at least crack open a textbook or two again to take a refresher in basic computer science... and that goes twice for programming pundits who carry so much weight with the community.

I feel this sort of glibness is really, REALLY inexcusable. Sometimes when I read programming blogs I feel like I'm watching Hardcopy or Geraldo Rivera. It's that bad.

# posted by

Frank Atanassow : 7:32 PM

Maybe this presentation (http://www.cs.swan.ac.uk/~csteme/SEFM-Presentation.pdf) can clarify why is important to study refinement in Process Algebra (CSP) and algebraic specification (CASL) and relate them to the notion of Specification based Testing.
If someone is interested in the paper , google "Specification based testing for refinement"

# posted by

Teme : 3:53 PM

<< Home