Raganwald: If Sneetches with Stars use Java, and Sneetches without Stars use Ruby, who uses ML?

Sunday, October 08, 2006

If Sneetches with Stars use Java, and Sneetches without Stars use Ruby, who uses ML?

ML is a programming language featuring type inference: you don’t have to encumber your code with type declarations, the compiler can figure them out for you. So… are type inference languages like ML for Sneetches with or without stars? Or another kind of Sneetch entirely?

Update: More than a few people have written that Steve Yegge's association of static typing with neatness and dynamic typing with slovenliness runs opposite to their impressions of the kinds of people who strongly prefer one or the other. I used Steve's terms in the original post, partly because I thought people would get the same joke I thought Steve was making. It look slike they don't, nobody wrote to say "LOL." I have changed the terms to something that represents what I think of the cultural divide between programmers who like Java and programmers who like Ruby.

Let’s review. Sneetches with stars like to use a colour-coded label maker to label the drawers, boxes, and files in their office. Once glance at everything and you know what it holds. Sneetches with stars add extra labels even when you don’t need them. For example, if a box is labeled ‘tax receipts’, each piece of paper inside has a post-it note saying tax receipt’, even if it’s obviously a tax receipt and lives inside the tax receipts box.

What is Stariness?

Sneetches with stars like these languages we say are statically typed. What do we mean by the word static? We mean it can be resolved at compile time. Other words for this idea are invariant or constant. Sneetches with stars like languages where the type of each entity can be resolved at compile time.

Some people are always critical of vague statements. I tend rather to be critical of precise statements; they are the only ones which can correctly be labelled "wrong."

Raymond Smullyan

Let’s dive into this a little deeper. (My apologies to my readers who were actually paying attention to the stuff in first year computer science that isn’t a requirement for getting a job at BigCo.) What does it mean when we say “something can be resolved at compile time”? That expression is laden with implementation details like assuming we’re using a compiler. But it’s a convenient short-hand for saying something about the program that is true every time you run the program.

Consider the final declaration in Java. If you write:

final String snafu = "situation normal...";

We know that the variable snafu always holds a reference to the constant string "situation normal...". No matter what data you feed to your program and how you mangle it, snafu will always be "situation normal...". Do you agree? (Joe Campbell, put your hand down. Yes, there is a back door way you can change the contents of a String in Java.)

Java can take advantage of this to perform constant propagation. Everywhere you write snafu, Java can substitute "situation normal..." and throw away the variable lookup. To get away from arguing about back doors in the String class, let’s consider one of the primitive types, a boolean. If you write:

final boolean foo = true;
// code without assignments to foo
if (foo) {
// do something
}
else {
// do something else
}

Wouldn’t you agree that the compiler can get rid of the variable lookup and the if statement? The path through the code is always through the // do something path every time you run the program.

Now back to the word stariness. We really mean the amount of stuff about the program that can be resolved at compile time, or if you prefer, the amount of stuff that is true every time you run the program.

Stuff that is always true is useful. For most programs, we have an idea in our head about “correctness.” What we mean when we talk about a program being correct is that it produces desirable results every time you run the program.

A formalist is one who cannot understand a theory unless it is meaningless.

Stariness is thus similar to correctness. And that’s why a lot of people, the Sneetches with stars, are obsessed with it. Being able to “prove” something about their program (“the method call foo.bar(5) never throws a MethodNotImplemented exception”) feels a lot like being able to prove that their program is correct.

It feels a lot like it, but it isn’t the same thing. The reason it isn’t the same thing is that while its true that a program throwing MethodNotImplemented exceptions is probably not correct, it’s not true that a program that doesn’t throw such exceptions is correct. It just feels, somehow, more likely to be correct because we’ve thrown out one of the infinite ways it can be incorrect.

Now that we’ve dispatched that logically, let’s be clear about something: just because stariness does not enforce correctness, it doesn’t mean that stariness isn’t useful. Stariness is useful. Period, no debate.

Back to inferences

Type inference is also for Sneetches with stars. A language with type inference resolves the type of each entity at compile time by inspecting the program and figuring the types out through inspection. It’s a lot like the way a compiler can look at the Java code above and figure out that you always // do something and you never // do something else. The code looks sorta like you could go either way, but the compiler knows better.

Languages with type inference look like variables can have any type, but the compiler knows better. Remember the labels that the verbose declaration Sneetches with stars love? Type inference languages still have labels, but the labels are hidden inside of the files and boxes where you can’t see them.

Remember when manufacturers used to put their labels inside clothes instead of right across the front? Same thing. The rules for what goes where are strictly enforced, it’s just that if you can figure out what goes where with a bit of common sense, you don’t need a label or a post-it note.

Compare these two snippets of Java:

final String[] words = { "foo", "bar", "blitz" };
final int word_length = words.length;
final String[] anagrams = new String[word_length];

…and…

final words = { "foo", "bar", "blitz" };
final word_length = words.length;
final anagrams = new String[word_length];

Hey, if a variable is final, we can almost always figure out its type in Java. Making that work in the compiler is something an intern ought to be able to do over a Summer work term!

So if we take a valid Java program and simply erased type declarations whenever we could logically deduce the type of the variables, but left them in whenever we were not sure of the final type of the variables, we would have exactly the same program. Nothing about it has changed except it has fewer symbols. It’s just as neat, it is just as static, it is no more or less correct than it was before we erased some symbols.

And you over there itching to say something about IDE refactorings and auto-completions: None of those go away either. You can rename things and move things and press command-tab to get an object’s methods whenever you like. So… would you agree that type inference of this sort doesn’t change a neat program into a starless program? This isn’t about stariness versus starlessness, it’s about the obsessive-compulsive desire to label everything.

The bottom line: type inference does not change a statically typed language into a dynamically typed language. It’s still starry.

So why can’t the Sneetches without stars use type inference?

Think of types as being like values and objects like variables. A statically typed language is one where there are no type re-assignments. Some languages enforce this. But if you write a program in a static way, you can still reason about it. This is why lots of people think that we can “neaten up” languages like Ruby by adding type inference to the compiler: they're thinking about programs that are neat to begin with, but we happen to have written them in a language for Sneetches without stars.

And whenever someone talks about a refactoring IDE or an auto-completing IDE for a dynamic language, they’re talking about performing some type inference on Ruby programs that are written in a static way. So… what’s the holdup? We said we could add type inference to Java in a Summer. Where’s the intern to add it to Ruby?

Programmed. In me somewhere, he thought, there is a matrix fitted in place, a grid screen that cuts me off from certain thoughts, certain actions. And forces me into others. I am not free. I never was, but now I know that; that makes it different.

Philip K. Dick, "The Electric Ant"

The problem is that the set of all programs that are "starry" is a subset of the set of all programs that parse correctly. So either not all starless programs are neat, or not all portions of a starless program are neat, or both.

Let’s compare back to our Java snippet. Remember:

final words = { "foo", "bar", "blitz" };
final word_length = words.length;
final anagrams = new String[word_length];

The compiler could perform inference because it knows that the variables are not reassigned. They’re immutable. What happens if we erase the final keyword as well:

words = { "foo", "bar", "blitz" };
word_length = words.length;
anagrams = new String[word_length];

Now the job is much harder. We have to first infer which variables are final and then we can infer their types. Inferring finality can be done for a very restricted set of variables, such as some local variables. For a very large class of programs, we cannot infer the contents of a variable with less runtime complexity than running the program for every possible input. This is why compilers have limitations on the optimizations they can perform, and humans still need to do some thinking about writing fast programs.

In starless languges, there is no final keyword on the types of objects.

The type inference problem is exactly the same as the inferring the possible contents of a variable problem. The inferring the contents of a variable problem is doable for a restricted set of programs. And the way we tell the compiler that a variable is a member of this restricted set is with the final keyword.

Likewise, the way we tell a compiler that the type of a variable is also restricted is that we use a language where the type of every variable is final. It’s the same thing: we don’t reassign final variables and we don’t change types on the fly.

Starlessness is not about writing programs without labels. Starlessness is when you write dynamic programs. Dynamic doesn’t mean ‘unlabeled’. As I showed above, if the final keyword is there, the label is mostly optional. But if you don’t have final, you’re writing dynamic programs.

Truly starless programs have dynamic types: types that change at run time. they are not always one thing or another. For example, what if you write an Object-Relational Mapper (“ORM”) that reflects on the database structure at run time. That is, you can change columns in a database table and you get new getters and setter methods in your program. Without recompiling.

In a fully static language (with or without type inference), you can’t do that. Think of Java’s JDBC: you have to fool around with methods that get values and pass a column name as a parameter. Or maybe you create a hash. And C# is getting this capability, but of you look closely you still have to define the “type” of a query through the LINQ syntax.

Are Sneetches with stars ever starless?

A dynamically typed language lets us define an object holding a database row with methods for each column. But we can’t know at compile time whether our program will throw a MethodNotImplemented exception because we don’t know whether someone will monkey with the database structure. That sounds bad.

But what happens if you write the same thing in a neat program? Aha! a SQLException! it seems that there are dynamic things that must be dynamic no matter what you do.

This is a specific case of Greenspunning. There are some facilities of dynamic languages that you are going to need. If you don’t have them built into your static language, you will build them yourself or use a framework that has already built them for you. Other examples I have seen in Java programs include:

Spring and Hibernate;
Any use of Class.forName(...);
Any use of dynamic proxies;

In essence, you’re being a Sneetch without a star but twisting your starry language to permit starlessness. And for those portions of the program that are no longer nice, starry bundles that can be examined at compile time for invariant behaviour, you are indeed in dynamic territory and have to live with the risks.

In my experience, all non-trivial starry programs contain this kind of starlessness. To my admittedly inexperienced eyes, starlessness is the hallmark of expert programming in starry languages ("expert" does not necessarily mean "more desirable," especially in the minds of those who believe that programs should be written and maintained by mediocre developers).

Eating cake

So… can we say that since you can write starless programs in neat languages, you can have the useful benefits of stariness when you need it and the flexibility of starlessness when you need that too? Isn’t that better?

Yes, you can say that. And you may be right, for you. The Boo people believe that: their language has a duck keyword for when you feel like a Sneetch without a star. Be aware that at this moment in history, languages designed for Sneetches without stars seem to have much better features for writing starless programs than languages for Sneetches with stars. So my observation is this:

If you dislike the verbosity of starry languages like Java but like the feeling of safety, try a type inference language. Don’t go to a starless language if you don’t intend to actually write dynamically typed programs.

My experience is that if you are frustrated by the amount of work you have to do to express the algorithms in your head, you should look at a language that removes your frustration. If you're using Java and don't like the verbosity, find a language that celebrates brevity while preserving static typing. But if you're using Java and find yourself pushing more and more logic out of java because its type system is too static or too inflexible, you should consider a language with a different approach to typing.

Computer languages differ not so much in what they make possible, but in what they make easy.

Larry Wall

Why would the Sneetches without stars use starless languages?

Writing starless programs on top of neat languages is exactly the same thing as writing automatic memory management routines on top of a manually managed programming language or writing functional programs on top of a noun-centric object-oriented language.

You can take that statement as an argument in favour of specialized languages for Sneetches without stars or as an argument against them. My guess is that the above statement is true and a Rorschach Inkblot: You will interpret it as confirmation of your existing prejudices.

¶ 3:34 PM

Comments:

There are big "yes but" to using inference typing.

first, mixing inference with side effects is complicated O'caml weak types are a sample of this complexity. For example 2 programs with the same value denotation might not denote the same type.

second, inference typing leads to complicated (meaningful) error reporting. If I remeber well, Xavier Leroy himself told that useful error reporting is a matter of research by itself.

Remember : there is no silver bullets, only compromises and trends.

# posted by nraynaud : 6:51 PM

If Neat Freaks use Java, and Slobs use Ruby...
A fallacy - so we need read no further.

# posted by Isaac Gouy : 7:04 PM

I don’t know how something that fails to make the proper Foleyesque slobering noises about dynamic typing and Ruby on Rails managed to get voted up on reddit, but I am glad this did. It is a rather excellent article that manages to nail the issue without really taking sides. My only complaint is this: it gets the recommendation exactly wrong.

If you use a slobby language like python you will have to become extremely neat. If you work on a project with more than one programmer you will have to start documenting the types of every function in comments or you you will find yourself having discussions like “fred, does this method expect a list of hashes that map ints to strings, or a list of hashes that map ints to user objects?”. You will also have to be very serious about your testing strategy to make sure that you maintain the ability to refactor when you have a lot of code, otherwise it is impossible to know when you have broken something. If you have only 60% unit test coverage in some area, you will need to do a lot of manual testing to ensure you haven’t broken something with your refactoring.

Likewise if you work in a language where the compiler checks your types for you, you can be a lot sloppier. You can make interface changes and depend on the compiler (or a refactoring tool) to find all the various places you have just broken.

One should pick tools that correct ones own tendencies (or those of your team), not tools that exacerbate problems. It may be that dynamic types appeal to slobs, but that does not mean that dynamic types are good for slobs.

So my recommendation would be this: if you are very precise about types, so much so that you can maintain them all in your head, then dynamic languages are the way to go–they will offer you flexibility and less writing out of types. If you find yourself slipping a bit in your precision, then you want a compiler which checks static types for you. I know that I am a slob, so I think that type inference is in my future.

# posted by jay : 10:21 PM

Jay:

I appreciate your arguments, and I agree with you rrecommendations.

That being said, I think they are orthogonal to my own recommendation, which is to use a statically typed language if your types never change.

'Slovenly languages' aren't really for people who are careless, but rather they are for people who sometimes do one thing with a variable, and sometimes another.

In my own case, one of the things I regularly do with Ruby is extend pre-existing types with new functionality.

This is a big no-no with the neatness crowd. This is why Java really permits you to mark a class final: so that other programmers cannot extend it.

# posted by Reginald Braithwaite : 11:34 PM

nreynaud:

There are big "yes but" to using inference typing.

Your point about error reporting is particularly salient.

As to silver bullets, Wasabi... oh never mind, I said that already.

# posted by Reginald Braithwaite : 11:36 PM

Isaac:

Thanks, and I would say that is a static property of this blog.

# posted by Reginald Braithwaite : 11:38 PM

Me thinks me would go insane trying to write monadic code in Haskell without type inference.

# posted by Johan Tibell : 4:45 AM

Correctness is the killer app, it saves me from errors. But we're not there yet.

The problem with correctness is that it doesn't scale well. Languages that come from the academic world (things ending with calculus) have correctness, but are very hard to work with for sizeable problems. Java attacked this for small, single purpose applications for handheld devices, but the language didn't scale to handle the problems we're solving today. Ruby and other dynamic languages are worse at correctness, but they have less code to prove.

Meta-programming, like frameworks, APIs and constructs before, simply move the correctness to a place where it's easier to prove. Easier just because you have more people looking at it (the brute force approach). So if I have to judge for correctness, I think they're all equally bad, or equally good, depending on the half of the glass.

I open up a lot of code for access through remote APIs, the moment I start worrying about operations and types that are not enforced by the language, is the moment I stop caring about Java-like correctness. I just want to write less code, so it's easier to prove.

Neatness is separate from correctness, it's our need to keep the cognitive load minimal. I don't think Java is more neat than Ruby (or the other way around), they're just neat in different ways. Verbosity is neatness for some, a big mess for others.

And then, there's status quo ...

# posted by assaf : 12:07 PM

<< Home