A "fair and balanced" look at the static vs. dynamic typing schism

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Tuesday, March 14, 2006

A long time ago in an industry far, far away, everyone programmed in C.

Like Joel says, this is a broad stroke observation. I happened to use Pascal an awful lot at the time, so I realize that "everyone" doesn't mean "every last one of us." But it does mean "so many people that no other language mattered to the industry as a whole."

C has many delightful advantages. It also has two, umm, features that programmers routinely screwed up. The first, as everyone knows, is the lack of memory management. Programmers had to do everything themselves and do it properly. If you made a mistake, your program would suffer from dangling pointers, or have a memory leak.

The second was the almost complete lack of type safety. That's a big hand wave. I realize that if you're very careful you can make a C compiler, especially an ANSI C compiler, do a little static type checking for you. That's the theory. But in practice, it had this one little thing that made type safety moot: the unconditional type cast.

How many readers know what I'm talking about? Put your hand up... Good! Talk amongst yourselves for the next paragraph while I explain this to the readers who are under the age of forty.

Why C casts are dangerous
With an unconditional type cast, the C compiler would let you treat an address of something in memory as whatever you want. You could treat a byte as the start of a string. Or you could treat a long word as the address of a function. Or you could treat it as an array of pointers to functions. So if you had a pointer to the address of an array of pointers to functions and you mistakenly cast it to the address of an array of pointers to functions, and then you tried to call one of those functions, bad things would happen.

Okay, everyone back? This was phenomenally bad, especially because we do this Von Neumann thing and have bytes of memory that are sometimes code to be executed and sometimes data to be shuffled around. If you have a pointer that is pointing to the wrong thing, the program would sometimes crash right away, or it would sometimes chug along for a while silently corrupting memory until everything failed. There's an entire industry of malicious people taking advantage of this possibility to craft buffer overflow attacks.

This problem was so bad that it was one of the things C++ tried to fix with its casting constructs. One of them, the dynamic cast, specifically checks the situation at run time to make sure that the cast is safe. The only trouble was, C++ let you keep the dangerous C cast and then threw in a variation called static cast that was nicer looking but still unsafe some of the time.

I want to underline the consequences of bad casts here. When discussing risks, you always have to thing along two axes: the likelihood of disaster and the magnitude of the consequences. C programs with bad casts have extremely bad consequences. The might crash. They might corrupt all of their data. They could cause other things to crash. These are all terrible.

When we discuss "type safety" we have to think along the same two axes of risk: there is safety that protects us from something going wrong, and there is safety that limits the consequences when something does go wrong.

Well, one day Java came along. You may deride it as "Cobol Lite," but it did two things an awful lot better than C++, and if you look back at history it succeeded by converting people away from C++. The first thing it did better was to make automatic memory management mandatory.

The second was to throw out all of the dangerous casts and replace them with a single casting operation that always checks types at run time and throws a nice exception right away if you get it wrong. Java, in effect, provides two kinds of type safety: its static type checking reduces the likelihood of a typing error, and its runtime cast checking sharply limits the consequences of a bad cast.

Before we argue about static vs. dynamic type checking, let's remember where this fanaticism for so-called strong typing came from: it came from an entire industry that had been burned by having a trap door that led to having no typing. Back when it was C/C++ vs. Java, the debate was between having no type checking at run time vs. type checking at run time.

(Attention pedants: I'm very aware of how much static type checking ANSI C and C++ can do. I'm judging these languages by the bad news edge case bugs, not by the billions of perfectly fine lines of code that eschewed unsafe casts.)

For the most part, today we see people arguing about static vs. dynamic typing. They argue about the effort involved in telling the compiler what to check, and whether having the compiler find some errors for them does or doesn't make up for the extra code.

The static folks say "hell, yeah, you're headed for big trouble if a type checking bug slips through the compiler into production." Okay, let's look at the history. Back in the day when Reg was young and user interface design consisted of colouring the punch cards, a type error slipping into a C program represented a real risk of a catastrophic problem. True.

But let's say you want to use one of these "new-fangled" dynamic languages. Are you exposed to the same risk? The answer is no. The reason is that these dynamic languages have types and check them for you, just like Java's checked type casts. The consequences of a type error are trivial compared to C/C++ errors.

How trivial can a type error be?
Consider a web application. If a type error takes place, users will get a server error 500 response. The server will not crash, the database will not be infested with corrupt data, and users will not find themselves charging their purchases to someone else's credit cards. The consequences of a type error are relatively mild when the language checks types at run time.

Let's consider sending a message to an object (not all dynamic languages have objects, but whatever). How would this happen in C? Well, you'd have a pointer or a handle to an object, and it would have a pointer to an array of pointers to functions representing its methods. (It could be more complex if there's inheritance or aggregation).

So you might try dereferencing the handle twice, looking up the offset of the pointer to the array, then looking up the offset of the function you want, dereference that pointer, and call the function. If any of this is wrong, boom. Actually, it's worse than boom. You might not find out it was wrong for a very long time. Silent but deadly!

In a dynamic language, you try sending a message to the object and if the object doesn't handle that message, there's a very explicit behaviour for managing the error. It calls a special missing method method. Or it throws a specific exception. Dynamic languages address the issue just like C++ did with dynamic casts and Java does all the time with its casts.

Dynamic language advocates argue for more automated unit tests instead of compiler checking. Well, I'm in favour of more testing whenever you can get it. But to my eye, that isn't the point. The point to me is that the potential for catastrophe with both Java and dynamic languages is so much smaller than the potential for catastrophe with C/C++ that the debate about type safety is almost moot.

I really don't want to get into a shrill "do so! do not!!" debate. But if you're reading this and you're still planted 100% in the strong, static type checking camp, let me ask you one question:

Have you actually worked on a project where casting errors caused the failure of the project? I mean the product failed in the marketplace, or you spent so much time trying to find and squash critical bugs that the project was cancelled?

I have a funny feeling that most or all of the people who answer "yes" were working with C/C++ and unconditional casts. I have a feeling that as an industry we're so scarred with those problems that we don't realize that moving to run time type checking solves 75% of our problems and makes the errors 99% less dangerous.

Labels: popular

¶ 4:18 PM

Comments on “A "fair and balanced" look at the static vs. dynamic typing schism”:

"fair and balanced" just like Fox news is "fair and balanced" I guess.

# posted by

Anonymous : 6:38 PM

"fair and balanced": well, yes, I assumed you'd get the joke without explictly winking :-)

# posted by

Reginald Braithwaite : 11:28 PM

Just read what I thought was a very interesting blog post; and then I saw the 'Fox News' comment. So - I'm pitching in and saying: great read - it's given me good food for thought...

# posted by

Anonymous : 11:31 PM

I am still 50/50 in this whole static versus dynamic typing debate. For me the issue is not one of safety, but more about loss of information about variables at edit time.

Without static typing you loose:
1) IDE autocomplete assistance
2) Most auto refactorings. (Refactoring engine cannot at edit time, figure out what types reference other types, and therefore is unable to do it's job.)
3) Less information can be infered from the code when dynamic typing is used. It makes it harder to reason about. (what are the capabilities of this object here?... hmmm now I have to go and look for all of the places that pass a value to this method so I can figure out what type it is)

And many of the arguents against static typing seem argue that static typing means more typing on the keyboard (no pun intended).

I never buy that argument, as the speed I type is not my bottleneck: most of the time I am thinking.

Paul Graham reckons that static typing means you can't have true macros (as in Lisp), however without understanding that statement fully I can't comment.

Much of this static versus dynamic typing debate could go away when there is no difference between edit time and run time. You would be able to ask a 'live' object what it is capable of before you edit the code.

Anyway, all I am saying is that 'type safety' is only part of the story in a static versus dynamic debate. There is a lot more to it!

Are there any modern languages that offer the option of statically typing where you want to and dynamically typing in other places?

Just my two cents!

# posted by

James Sadler : 9:23 PM

james,

With a dynamic model, you actually have more accurate autocomplete assistance in a real dynamic IDE, because the IDE itself has _more_ information about the types of the objects, not less.

Refactoring in a dynamic IDE is streamlined and simplified, for the same reasons, but you'll find that you don't need to refactor much, if at all, if using a good dynamic language like Ruby.

In a dynamic world, you just ask the object what its capabilities are. You don't have to look up call sites in the source code. What dynamic language have you been using that does not have this capability?

Also, Objective-C offers dynamic typing and optional static typing. The static typing allows the compiler to detect many type errors at compile time. In practice, you use static typing in most places in Objective-C, because it's still C at the core, and it's still a pain to go through the debug-edit-test cycle to find out you've gotten the type of something wrong. In a fully dynamic language, it really isn't an issue.

Again, Ruby is the exemplar. SmallTalk (from whence large bits of Objective-C and Ruby came) is both the original object-oriented language and a darn fine 'dynamic language' in the sense we know it today.

# posted by

Anonymous : 10:22 PM

James, the Groovy language allows you to use static typing where you want to, and stay dynamic where it doesn't matter. It's definitely worth checking out.

# posted by

Tiago Silveira : 3:40 AM

I can't agree that you'll want to refactor less. I want to refactor dynamic code just as much as my Java code, but find myself doing it less because I don't have the tools.

Smalltalk and Bicycle Repair Man (for Python) aside, dynamic languages still lack good refactoring support, largely (but not only) because the nature of dynamic typing makes it harder to create a domain model of the code structure in the same way Eclipse or IntelliJ does for Java. I say harder, because I don't know of anyone who has done it for either Python of Ruby - Bicycle Repair man(brm) actually solves 99% of refactoring cases using recursive grepping, and simply asks the user for help the rest of the time - which is to my mind good enough.

Taking BRM as an example however I'm hopeful someone will port it to Ruby using the same techniques!

Does anyone know how Smalltalk accomplished it's refactoring support? Was it factored (pardon the pun) into the design of the language up front?

James, autocompletion is possible with a 'Find Definition' ability - something BRM can do. As for less information being available, well I suppose it depends on how you code - if you create clearly named objects, and assuming you have 'find definition'/autocompletion I'm not sure what else you lack. The forthcoming Protest python testing tool will certainly show that if you write good tests this is even easier (and can actually improve on those tools currently available to Java).

# posted by

Sam : 4:12 AM

IDE can reason better about variable type when you work on static type language.

But you don't need Manifest typing. C++/Java/C# is Manifest typing, you always need to retype the type of variable. Language like Haskell support Type Inference so you can just type the type only when needed and let the compiler work out the rest. Type inference will be available in next C++ standard and C# 3.0 so may be that's how you will get the clean look of dynamic language while still maintaining Compile time type checking.

Most people who hate dynamic typing blame the wrong thing. When they said they are bitten, they are not actually bitten by dynamic typing feature of the language, but they are bitten by weak typing feature of the language. dynamic and strong type language is good. static and weak type is bad. dynamic and weak type and all hell break loses.

# posted by

Anonymous : 6:29 AM

I'm fairly new to the debate. So what classification would JavaScript fall under: dynamic + strong type, static + weak type, dynamic + weak type, or something else? I ask because it always elicits plenty of groans and grumbles when we have to revise our JavaScript on the web sites, partly due to cross-browser compatibility issues, but also because of how "loose" it is compared to Java. I think this question will become more relevant with the rising popularity of the AJAX approach to web applications.

Cheers,

Ted

# posted by

Anonymous : 9:13 AM

whyis that discussion so often an XOR
discussion? why oh why?
http://yozzeff.blogspot.com/

# posted by

yozzeff : 9:35 AM

I think there's a balance to be struck. On the one hand, using dynamic languages requires a higher degree of skill. On the other hand, it allows skilled practitioners to be much more productive, and allows them to produce executables that can be far more efficient in terms of CPU and memory usage. In my experience, a smaller, more experienced team making use of dynamic language facilities will run rings around development teams that are not similarly constituted.

Sadly, IT managers are often pound wise and penny foolish, thinking to contain costs by paying lower labor rates. This strategy tends to backfire more often than not, especially for larger and/or more technically challenging projects.

# posted by

Jonathan Lehr : 12:48 PM

'When we discuss "type safety"...'
We can just use a text book definition: "Type safety is the property that no primitive operation ever applies to values of the wrong type."
p263 Programming Languages:
Application and Interpretation

That says nothing about how type safety is achieved (static type checks, runtime type checks, commonly both).

"a trap door that led to having no typing"
A trap door could have been okay (Ada, Modula, ML); a 5 lane super-highway - not okay.

'..."new-fangled" dynamic languages. Are you exposed to the same risk? The answer is no. The reason is...'
The reason is that these languages are type safe.

"Consider a web application... The consequences of a type error are relatively mild when the language checks types at run time."
The consequences are utterly dependent on the context of use.
- Failure to quickly complete a web application transaction could have multi million dollar opportunity costs, or it could just mean the user clicks reload.
- Runtime error could crash an airliner, or it could just pause the program until the on-call guy returns from the bar to fix and resume the program.
- Runtime error might mean answering bug reports and complaints from 3 million users, or it could just mean walking into the next cube to see what's going on.

"spent so much time trying to find and squash critical bugs that the project was cancelled?"
Yes, some third-party Smalltalk software never reached acceptable quality (in some other language the third-party software may not even have compiled).

"solves 75% of our problems and makes the errors 99% less dangerous"
As an industry, we already maintain a pretence that product errors are something users should put up with.

As an industry, rather than pretending that errors don't matter, we need better ways to detect errors before we deliver the product.

Better ways like static analysis of byte code produced by a dynamic type language.

# posted by

Isaac Gouy : 2:10 PM

The static vs dynamic debate seems like a moot debate. First of all most of the debate involves making normative instead of positive statements, which seems like an absolutely ridiculous thing to do as a group regarding a technology (especially since one consequence of making positive statements regarding type models is that research can test these statements and hence progress in improving program efficiency and correctness in a scientific manner is possible, I'm not sure how Voodoo Mysticism came to dominate the programming culture so effectively but it's disgusting). Furthermore, it is possible to construct a language which has the properties of both a static and a dynamic language in that either static or dynamic type checking can be declared for code blocks with a keyword. The resulting language would be "dynamically" typed by definition, but would have all the properties of "statically typed languages" with the inclusion of a keyword (or static could be the default), so I'm not sure what all the Voodoo Mystics are babbling about when a superset language can be constructed that will satisfy both "sides." I understand that normative statements and taking "sides" can be appealing because programming is an art, but I don't think decisions in entire industries should be driven by Voodoo magic and personal aesthetics -- instead powerful programming languages should be constructed which allow people to adopt their preferred aesthetics.

# posted by

Connelly : 10:53 PM

Excelent posting!

Having managed a team working on a large (and bad) Perl codebase which has MANY problems that would have been avoided by compile-time type checking, I was wondering whether this is just a matter of particularly bad programming or it is the unavoidable fate of a long-lived large product written in a language with runtime-checked types.

You're right: my fears come from the C and Perl worlds - in C because you can turn whatever into anything, in Perl because you can turn strings into numbers and viceversa, and 0, undef, and '' are the same thing. Most of the above-mentioned problems with that Perl crap fall into this category.

Hmmm... I guess you've just saved my next team from being forced to move to Java.

Thanks for the enlightenment,

Jordi.

# posted by

Anonymous : 4:39 PM

You should just say that it's your opinion, and quite uninformed at that.

The fact is that both serve their purpose. Get off you high(really low) horse, and pick up the tool that solves your problem.

After you've done that, let everybody else do the same.

# posted by

Anonymous : 9:37 PM

Anonymous:

You should just say that it's your opinion

This is a weblog, not a research paper. Do you really need to be told that it is an opinion? Should there be fine print to prevent you from hurting yourself?

let everybody else do...

I didn't realize you needed my say-so before doing whatever you want. Well, here it is: do as you please. Thank you, carry on.

# posted by

Reginald Braithwaite : 11:04 PM

Hmmm...the idea that casting errors are somehow insidious and terrible seems laughable to me.
I notice that *no one*(!) has answered "yes" to the question that casting errors caused a project to fail! I suspect this is just some theoretical thought experiment.

As an actual matter of fact, we use the C++-style casts. But they don't really give any major advantages over the C-style casts (when you gotta use reinterpret_cast you gotta use it).

Typically the result of a casting failure is an immediate crash. And that makes it pretty damn easy to trace and fix. As I mentioned, we use casts and I have never even seen a casting error reach past the developer's initial test never mind it getting checked-in and shared with the team.

The idea that a casting error would reach production and then, somehow, do something *bad* sounds like a halloween scary story you make up sitting around a fire sipping chocolate.

My 2 cents ;-)

# posted by

brown-dragon : 5:37 AM

The idea that a casting error would reach production and do something bad is a halloween story? I don't agree with this. However, you also said that typically the result of a casting failure is an immediate crash. Here's an immediate crash:

The Arianne 5, in 1996, went off course and exploded 40 seconds into launch, when a 64-bit floating point number was cast to a 16-bit signed integer and overflowed. The loss was valued at half a billion dollars. See this.

On the main topic, well, I'm a Lisp guy at heart (originally Lisp machines at MIT and Symbolics, now an airline reservation system at ITA Software), but I've also used Java extensively (at Object Design and BEA), and in a nutshell, both approaches have their pros and cons. I'd like to get experience writing large software in a language with powerful type inference, and see to what extent that can give you the best of both worlds.

# posted by

dlweinreb : 10:18 AM

# posted by

dlweinreb : 10:19 AM

I have read all the comments here and must say a VERY SIMPLE THING.
When I compile with a static language most of *my* errors are caught. This might be lazy.. But most *good* programmers *are* lazy it a matter of pragmatism.. Management wants *so* much from us.. SO when the compiler can catch 80% of my errors with a static language I am thinking GAWD what is with all the buzz about dynamic languages? I know you can't tell me with a straight face that dynamic languages are easier to catch stupid errors that I made than dynamic languages??? I mean common errors are indeed easier to make since the compiler is so much forgiving.. no? well keep in mind that I *have* checked out both kinds of languages and found what I have to say most true.. perhaps the exception is ORM.. gawd ok.. I get it dynamic languages rules here... but thats about it!

# posted by

mail@joshsprabary.com : 9:42 PM

<< Home