I'll take Static Typing for $800, Alex.
There’s an argument that static typing prevents errors by detecting them at compile and/or edit time. In a trivial sense, this is absolutely true. You can, for example, write a Java program with such an error and watch Eclipse highlight the offence.
The interesting thing I’ve noticed is that many of the people in favour of static typing are arguing from a position of “Do what I say, not as I do.”
I don’t mean that they program in Python. I mean that when I ask them whether they would have typing troubles in a dynamic language, their answer is often “well, I wouldn’t, but every business needs hordes of monkey/offshore/intern/new graduate programmers who do make these errors
“People shouldn’t be able to open my classes.” “They need compile time type checking so the app doesn’t blow up.” “Extending system libraries is bad.”
Concerns like these all seem to boil down to one major theme: The people I work with are stupid.
It’s not really stupid people, it’s people who’ve accepted a stupid idea. Java’s design is based on the idea that a language can prevent misuse by making bad things hard to do. It’s defensive thinking.
Let’s stop worrying about errors we don’t actually make. Let’s stop worrying about errors some hypothetical junior, error-prone programmer might make.
Here’s an interesting question: what sorts of typing errors do experienced, intelligent programmers actually make? And what sorts of typing errors sneak through unit tests and even QA and into production? And most especially, what sorts of typing errors have catastrophic consequences in production?
Now those are interesting errors. Those are worth worrying about. I’ll go further: those are worth static typing.
Here’s one from my actual, hands on experience. Distinguishing escaped from unescaped strings. I don’t know if I’m using the right words here: I’m thinking of a typical XML or XHTML application where some of the time a string is just a string, but some of the time it has a bunch of its characters replaced or escaped with special entities.
Another case of escaped and unescaped strings concerns safely composing SQL queries and updates (another solved problem in other languages). The argument is always in favour of using library functions that do the conversion for you, like
PreparedStaement. If you think about it, that’s no damn good. What that does is treat everything like an unsafe String and only convert it at the very last second.
If you’re going to do any fancy SQL composition, you can only do it with stuff that isn’t a user value, so you have to keep track of escaped and unescaped strings anyways. And finally, your libraries still have all the unsafe APIs that don’t perform the conversion for you, so you are relying on your iron will and self-discipline to prevent errors, rather than having the compiler perform what is really a rather trivial check.
I have no problem with relying on iron will and self-discipline to eliminate errors. But if your argument is that iron will is appropriate for preventing SQL injection attacks, why isn’t iron will appropriate for preventing trivial type errors that would result in a
I’m not alone in considering this a problem. Web applications that screw this up are vulnerable to cross site scripting (XSS) attacks. This is very bad, and if static typing could help I’d eagerly embrace it.
A Little Java, a Few Patterns: The authors of The Little Schemer and The Little MLer bring deep and important insights to the Java language. Every serious Java programmer should own a copy.
How could static typing help? Well, imagine if you designate some strings as escaped and some as unescaped. So our type hierarchy is that there is String,
UnescapedString extends String
EscapedString extends String
. (Actually, I’d prefer interfaces if I were designing a Java-like language, but that’s by the by).
Now there are certain critical places where we would need to harden our application. The first is everywhere we get strings from users. These strings, just like
variables in scripting languages like PHP, need to be
. We would type our methods accordingly (in Java, this could be accomplished with annotations). For example, anything snarfed from the
object is an
Then when we present strings, we type the methods as taking
only. If we try to pass a POS (Plain Old String) or
to a method parameterized by an
, we get a compile time error.
To get around the errors, we need to escape our strings. We do that by writing a conversion method somewhere that, you guessed it, takes an
as a parameter and returns an
Naturally our application would be full of bookkeeping annotations as we keep track of which strings are escaped and which aren’t. But my gut feeling is that catching this kind of error at compile time would be worth it.
Here’s another error that I think is worth the effort of static typing. The bane of my existence when maintaining legacy Java code:
This is actually a solved problem in languages like Haskell. Static typing can easily distinguish between methods that might return a null (like getting a column from a database row) and variables that must not contain a null. The compiler can and should force you to write code that handles the null case.
The Little MLer introduces ML (and Ocaml) through a series of entertaining and straightforward exercises leading up to the construction of the Y Combinator.
With ML and Ocaml you can design rich types that fit the domain model and all types are checked at compile time through type inference.
Here’s my question to my fellow Java programmers: why do we tolerate a compiler that forces us to type some things as
and some things as
, but we don’t insist that the compiler catch places where we aren’t checking for
These are just two places where static typing could help experienced programmers solve problems that plague real, production code. I’m all for static typing, if it can help me with the errors I actually encounter.
That being said, there is a lot of work being done in this area, although obviously not by Sun or Microsoft (to be specific, not by their C#
team). As mentioned, Haskell and several other languages provide static typing that is sophisticated enough to prevent errors like this.
I’m not even close to being the first person to notice the problem:
- Joel Spolsky described using naming conventions to highlight errors like this. It’s interesting that Hungarian Notation was invented for this kind of thing, but somehow a Cargo Cult has arisen around Systems Hungarian where programmers use it for things like marking integers, a fact that the compiler and IDE already know, but don’t use it for domain-specific things like whether a string is safe.
- Tim Sweeney of Epic Games wrote an incredibly lucid wish list for The Next Mainstream Programming Language, where he discusses in detail the exact issues his team has faced building and maintaining Unreal. His presentation is available in PowerPoint and PDF formats. He has given a lot of thought to how static typing could help build and maintain huge, complex, commercial applications.
Okay, back to Earth. What can we do about this?
Here are my specific suggestions:
- Stop worrying about the theoretical errors we don’t actually make. We unit test, we review our code. We’re not concerned with obvious, superficial problems.
- Agitate for language features that can help us solve these important problems. If the next version of javac can do escape analysis, it can identify potential null pointer exceptions, possibly through inference so we don’t even have to type more code.
- Educate ourselves about the bleeding edge of language development. No, that isn’t C# 3.0, Common Lisp, or Ruby 2.0. It’s ML, Haskell, Erlang, and a bunch of other things I need to learn. We may not be able to use Haskell to build Yet Another Boring Web Commerce Application, but we might learn enough to use a new naming convention or possible to write a string container that enforces escaped safety.