A "fair and balanced" look at the static vs. dynamic typing schism
A long time ago in an industry far, far away, everyone programmed in C.
Like Joel says, this is a broad stroke observation. I happened to use Pascal an awful lot at the time, so I realize that "everyone" doesn't mean "every last one of us." But it does mean "so many people that no other language mattered to the industry as a whole."
C has many delightful advantages. It also has two, umm, features that programmers routinely screwed up. The first, as everyone knows, is the lack of memory management. Programmers had to do everything themselves and do it properly. If you made a mistake, your program would suffer from dangling pointers, or have a memory leak.
The second was the almost complete lack of type safety. That's a big hand wave. I realize that if you're very careful you can make a C compiler, especially an
ANSI C compiler, do a little static type checking for you. That's the theory. But in practice, it had this one little thing that made type safety moot: the unconditional type cast.
How many readers know what I'm talking about? Put your hand up... Good! Talk amongst yourselves for the next paragraph while I explain this to the readers who are under the age of forty.
Why C casts are dangerous
With an unconditional type cast, the C compiler would let you treat an address of something in memory as whatever you want. You could treat a byte as the start of a string. Or you could treat a long word as the address of a function. Or you could treat it as an array of pointers to functions. So if you had a pointer to the address of an array of pointers to functions and you mistakenly cast it to the address of an array of pointers to functions, and then you tried to call one of those functions, bad things would happen.
Okay, everyone back? This was phenomenally bad, especially because we do this Von Neumann thing and have bytes of memory that are sometimes code to be executed and sometimes data to be shuffled around. If you have a pointer that is pointing to the wrong thing, the program would sometimes crash right away, or it would sometimes chug along for a while silently corrupting memory until everything failed. There's an entire industry of malicious people taking advantage of this possibility to craft
buffer overflow attacks.
This problem was so bad that it was one of the things C++ tried to fix with its
casting constructs. One of them, the dynamic cast, specifically checks the situation at run time to make sure that the cast is safe. The only trouble was, C++ let you keep the dangerous C cast and then threw in a variation called static cast that was nicer looking but still unsafe some of the time.
I want to underline the consequences of bad casts here. When discussing risks, you always have to thing along two axes: the
likelihood of disaster and the
magnitude of the consequences. C programs with bad casts have extremely bad consequences. The might crash. They might corrupt all of their data. They could cause other things to crash. These are all terrible.
When we discuss "type safety" we have to think along the same two axes of risk: there is safety that protects us from something going wrong, and there is safety that limits the consequences when something does go wrong.
Well, one day Java came along. You may deride it as "Cobol Lite," but it did two things an awful lot better than C++, and if you look back at history it succeeded by converting people away from C++. The first thing it did better was to make automatic memory management mandatory.
The second was to throw out all of the dangerous casts and replace them with a single casting operation that always checks types at run time and throws a nice exception right away if you get it wrong. Java, in effect, provides two kinds of type safety: its static type checking reduces the likelihood of a typing error, and its runtime cast checking sharply limits the consequences of a bad cast.
Before we argue about static vs. dynamic type checking, let's remember where this fanaticism for so-called strong typing came from: it came from an entire industry that had been burned by having a trap door that led to having
no typing. Back when it was C/C++ vs. Java, the debate was between having no type checking at run time vs. type checking at run time.
(Attention pedants: I'm very aware of how much static type checking ANSI C and C++ can do. I'm judging these languages by the bad news edge case bugs, not by the billions of perfectly fine lines of code that eschewed unsafe casts.)
For the most part, today we see people arguing about static vs. dynamic typing. They argue about the effort involved in telling the compiler what to check, and whether having the compiler find some errors for them does or doesn't make up for the extra code.
The static folks say "hell, yeah, you're headed for big trouble if a type checking bug slips through the compiler into production." Okay, let's look at the history. Back in the day when Reg was young and user interface design consisted of colouring the punch cards, a type error slipping into a C program represented a real risk of a catastrophic problem. True.
But let's say you want to use one of these "new-fangled" dynamic languages. Are you exposed to the same risk? The answer is
no. The reason is that these dynamic languages have types and check them for you, just like Java's checked type casts. The consequences of a type error are trivial compared to C/C++ errors.
How trivial can a type error be?
Consider a web application. If a type error takes place, users will get a server error 500 response. The server will not crash, the database will not be infested with corrupt data, and users will not find themselves charging their purchases to someone else's credit cards. The consequences of a type error are relatively mild when the language checks types at run time.
Let's consider sending a message to an object (not all dynamic languages have objects, but whatever). How would this happen in C? Well, you'd have a pointer or a handle to an object, and it would have a pointer to an array of pointers to functions representing its methods. (It could be more complex if there's inheritance or aggregation).
So you might try dereferencing the handle twice, looking up the offset of the pointer to the array, then looking up the offset of the function you want, dereference that pointer, and call the function. If any of this is wrong, boom. Actually, it's worse than boom. You might not find out it was wrong for a very long time. Silent but deadly!
In a dynamic language, you try sending a message to the object and if the object doesn't handle that message, there's a very explicit behaviour for managing the error. It calls a special missing method method. Or it throws a specific exception. Dynamic languages address the issue just like C++ did with dynamic casts and Java does all the time with its casts.
Dynamic language advocates argue for more automated unit tests instead of compiler checking. Well, I'm in favour of more testing whenever you can get it. But to my eye, that isn't the point. The point to me is that the potential for catastrophe with both Java and dynamic languages is so much smaller than the potential for catastrophe with C/C++ that the debate about type safety is almost moot.
I really don't want to get into a shrill "do so! do not!!" debate. But if you're reading this and you're still planted 100% in the strong, static type checking camp, let me ask you one question:
Have you actually worked on a project where casting errors caused the failure of the project? I mean the product failed in the marketplace, or you spent so much time trying to find and squash critical bugs that the project was cancelled?I have a funny feeling that most or all of the people who answer "yes" were working with C/C++ and unconditional casts. I have a feeling that as an industry we're so scarred with those problems that we don't realize that moving to run time type checking solves 75% of our problems and makes the errors 99% less dangerous.
Labels: popular