raganwald
(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Wednesday, December 19, 2007
  A growing sense of doom washed over me as I read Steve's latest post


As a few people have noticed, this went out over RSS but was not on my site. The reason is that I “pulled” it. I started out thinking I would include some choice quotes from Steve’s essay and leave it at that, but… At some point I got enthusiastic and decided I wanted to say something about the culture of moving dirt (I think I have run out of things to say about Java the language, thank goodness). However, reality in the form of interesting work at work popped up, so I decided to put the essay off.

Bad luck, the post was already out on RSS. So here it is. I would like to rewrite it fully, just not right now. Sorry for letting it out onto RSS in such a half-baked form. I realize your time is valuable, and I think you deserve better from me.


Update: Golf is a good program spoiled.


I happen to hold a hard-won minority opinion about code bases. In particular I believe, quite staunchly I might add, that the worst thing that can happen to a code base is size…

People in the industry are very excited about various ideas that nominally help you deal with large code bases, such as IDEs that can manipulate code as “algebraic structures”, and search indexes, and so on. These people tend to view code bases much the way construction workers view dirt: they want great big machines that can move the dirt this way and that. There’s conservation of dirt at work: you can’t compress dirt, not much, so their solution set consists of various ways of shovelling the dirt around…

It’s just a mountain of dirt, and you just need big tools to move it around. The tools are exciting but the dirt is not…

The problem with Refactoring as applied to languages like Java, and this is really quite central to my thesis today, is that Refactoring makes the code base larger. I’d estimate that fewer than 5% of the standard refactorings supported by IDEs today make the code smaller. Refactoring is like cleaning your closet without being allowed to throw anything away. If you get a bigger closet, and put everything into nice labelled boxes, then your closet will unquestionably be more organized. But programmers tend to overlook the fact that spring cleaning works best when you’re willing to throw away stuff you don’t need

Design Patterns was a mid-1990s book that provided twenty-three fancy new boxes for organizing your closet, plus an extensibility mechanism for defining new types of boxes. It was really great for those of us who were trying to organize jam-packed closets with almost no boxes, bags, shelves or drawers. All we had to do was remodel our houses to make the closets four times bigger, and suddenly we could make them as clean as a Nordstrom merchandise rack.

A design pattern isn’t a feature. A Factory isn’t a feature, nor is a Delegate nor a Proxy nor a Bridge. They “enable” features in a very loose sense, by providing nice boxes to hold the features in. But boxes and bags and shelves take space. And design patterns – at least most of the patterns in the “Gang of Four” book – make code bases get bigger. Tragically, the only GoF pattern that can help code get smaller (Interpreter) is utterly ignored by programmers who otherwise have the names of Design Patterns tattooed on their various body parts

If you begin with the assumption that you need to shrink your code base, you will eventually be forced to conclude that you cannot continue to use Java. Conversely, if you begin with the assumption that you must use Java, then you will eventually be forced to conclude that you will have millions of lines of code…

You should take anything a “Java programmer” tells you with a hefty grain of salt, because an “X programmer”, for any value of X, is a weak player. You have to cross-train to be a decent athlete these days. Programmers need to be fluent in multiple languages with fundamentally different “character” before they can make truly informed design decisions

Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.

Imagine that you have a tool that lets you manage huge Tetris screens that are hundreds of stories high. In this scenario, stacking the pieces isn’t a problem, so there’s no need to be able to eliminate pieces. This is the cultural problem: they don’t realize they’re not actually playing the right game anymore…

Java-style IDEs intrinsically create a circular problem. The circularity stems from the nature of programming languages: the “game piece” shapes are determined by the language’s static type system. Java’s game pieces don’t permit code elimination because Java’s static type system doesn’t have any compression facilities – no macros, no lambdas, no declarative data structures, no templates, nothing that would permit the removal of the copy-and-paste duplication patterns that Java programmers think of as “inevitable boilerplate”, but which are in fact easily factored out in dynamic languages.

Dynamic features make it more difficult for IDEs to work their static code-base-management magic. IDEs don’t work as well with dynamic code features, so IDEs are responsible for encouraging the use of languages that require… IDEs. Ouch…
—All quotes from Code's Worst Enemy by Steve Yegge

I have read many essays taking the minority position and explaining what is wrong with the “Java Programmer” mindset (as opposed to the “Programmer who happens to use Java mindset”), but Steve’s essay laid the cultural problems bare for me. He explains the technical issues well, but where this essay really shines is explaining the cultural issues and how they work with the technical issues to create a vicious cycle.

Of course I recommend reading the original. But may I add, please do not get sucked into arguing whether Design Patterns are good, or whether IDE refactorings really work, or any of the other technical points that are so much fun to rehash for the millionth time.

Instead, consider the cultural forces at work. Cultural problems cannot be solved with technology. If you are an advocate for change, ask yourself what sort of cultural change is needed, not what sort of technical problems need to be solved.

And oh yes… What do you think of his central thesis?
 

Comments on “A growing sense of doom washed over me as I read Steve's latest post:
The complexity of today's Java IDEs is somewhat of a self-fulfilling prophecy; I'm not sure how I'd do any programming in Java without Eclipse, frankly. And that scares me.
 
Reg,

Steve's reasoning is long, convoluted and simplistic. I'd love to see somebody explain in English how lines of code are actually the most important software metric. And "Don't use Java or IDEs!", besides being the kind of backwards reasoning that should've died out with assembly, isn't a serious argument for cultural change.

But I do wonder why people praise the simplicity of something like Rails and then argue for complicated tools like macros and Ruby magic. Perhaps the goal isn't simplicity at all. Maybe there's something else that helps a person convince themselves that they don't need an IDE.
 
I'll have more to say when I have had a chance to write up my own thoughts.

But since you mentioned Assembly, is it possible that the argument in favour of moving upwards from Java is exactly the same argument as moving upwards from Assembly to Fortran?

I think LOC are easy to dispute in the small, but also easy to understand in the large. We can debate all day long whether this method or that is more or less clear with ten lines or fifteen.

But my experience is that on average, when looking at many examples in the wild (not hand-picked inventions to support my or your argument), code that is briefer by an order of magnitude is better.

Is a ten-liner better than a fifteen-liner? Depends, let's see them. But is 100,000 LOC code better than 500,000 LOC? I say yes, yes, yes.

complicated tools like macros and Ruby magic

I must say, I'm confused. That must be because I don't find them complicated. Do you?

Or are you arguing from the elitist point of view that you personally don't find them complicated, but those other programmers over there find them complicated?
 
I've always figured it comes down to how much artificial complexity is riding around within your code. That is, the complexity that is there beyond the required inherent complexity of the problem you are trying to solve.

An obvious source comes from copying the same bits of code everywhere. Duplicate or triplicate code is more likely to cause problems. Not so obvious, is the complexity that comes from implementing a 'specific' solution multiple times, when a single 'general' one would do. That is complexity that is derived from not having applied a level of abstraction to the problem.

In general, less code is less complexity, but it is an arc. You can fall off the other side by being clever or tricky or such. Also, although on a character-by-character basis, a language like APL will massively compress the size of the solution, it does so by essentially increasing the complexity of the 'expression' of the solution itself; that's why it is often called a write-once-read-never language. Thus there is some amount of language based complexity that can get piled onto of the solution as well.

Paul.
http://theprogrammersparadox.blogspot.com
 
Hmm, I think features like macros are complex because they are complex. I don't think I've ever met a lisp programmer who didn't think they weren't complex and "hairy." It's another case where, strangely enough, I've often found that the more experienced the programmer the less likely they are to be tempted by such complicated features. It'd be interesting to do look further into this issue.

Of course, I agree that 100K LOC is better than 500K LOC. I don't think anybody would argue that we need more code. Such strawman LOCs don't contribute any kind of deeper understanding. There are many different traditions in play here but what interests me are these different notions of simplicity and complexity. Seems to me that there's a lot of programmers out there that imagine complexity is largely a function of language features and so they advocate for more powerful languages. It's an interesting hypothesis, I just don't think it has much basis in practical reality. In fact, I like to show something Ruby on Rails to my colleagues as the perfect example of why language features are never enough. Anyways I look forward to hearing your thoughts. It helps a lot to see where others are coming from.
 
I think that macros and templates (and probably many other language features) are complicated because that they "look like" code - that is, they are part of it - but are actually meta code, i.e. code about source code.

That means that to read them, you actually have to do some parsing/processing yourself - that is, think like a computer.

While to write (non-meta) code, you want to think like a person trying to get the computer to do something.
 
I agreed with Stevey all the way through, although I have personally never created codebases of that size. From reading some comments there and here, it occurs to me that I AM probably in the minority The most frequent and forceful argument from the "majority" I've heard is that code reduction isn't the only issue because you have to balance code size with readability. I agree with that but I take readability as a given in a forum like this because it's clear that people in either camp value it very much - no one is going to stand up and say: I think it's better to write code in binary because I can make the code smaller, in this forum anyway. Therefore, I think Stevey's statement is dead on.
And the way in which you balance writing code that's easy to read vs code that does a lot of magic to make other code small is to use Raganwald's yellow red and green guideline: http://weblog.raganwald.com/2006/12/economizing-can-be-penny-wise-and.html
 
The key thing I take away from that is that "Refactoring" IDEs are nothing of the kind - or rather, that the kinds of automated "refactoring" they support is nearly useless in relation to the kinds of refactoring a large and messy code base needs.

Probably the most important kind of refactoring you can do, to improve a large code base, is to find large chunks of duplicated code and to move them into a common routine; if possible to make them part of a shared base class, but if not, then to just toss it into a utility function which you can call everywhere it's needed. Result: reduction in LOC, and a single place where bugs may need to be fixed instead of 5. IDEs do little to help with this, except as glorified editors.

Divorce the concept of refactoring from what the IDE will do for you, and both become more useful. (I'm currently maintaining and enhancing a C# code base, not Java, but AFAICT C# is pretty much Java with a superficially different syntax painted on.)
 
After spending six months trying to add a fairly complex feature to an already-huge app, we canned the project. A year in, it wasn't feature-complete, and the light at the end of the tunnel was probably an oncoming train. I was the primary developer, assisted by two others in our beginning rush to get "v0.5" out.

I have since come to two conclusions.

#1 is that size is evil. I 'knew' that, but didn't have any experience of why. It is because the bigger a project gets, the harder it is to hold in your mind. You spend more time acquiring state, and if the state is too big to hold in working memory, you keep losing pieces of it. To continue Yegge's analogy, you start pushing dirt from here to there, and then re-discover there was a pond along the way. And then you have to get your dirt back out of the pond, and find some other way to get it over there, and the problem has become even more complicated than when it was "too complex" to remember it all. There is much pain there.

#2 is that the UI is harder than the backend/engine code. In the backend, each module is fairly decoupled. The UI side necessarily has to deal with Session, User, and Content objects all at once, along with providing the connection between the actual interface and the logic that makes it all work.

"How hard can it be?" is your greatest strength in the beginning, because it pushes you to start the project. But it becomes your greatest enemy when it drives you far into the realm of unmanageable complexity.
 
I've always figured it comes down to how much artificial complexity is riding around within your code. That is, the complexity that is there beyond the required inherent complexity of the problem you are trying to solve.

I agree with the above. And I think that this is what Steve Yegge is really talking about here. There's no reason to get nit picky about his decision to use generic LOC as the vehicle, and there's no reason to believe that Yegge is somehow championing cryptic obfuscation as a solution to verbosity (speaking of straw man arguments).

One problem with verbose languages is that they'll pressure you into magnifying the inherent complexity of the problem. Sure, a lot of artificial complexity can be avoided with good design and good libraries, but the programming language also has a big impact. Can we really deny this? The LOC-reducing effect that a good language has cannot always be blindly chalked up to "magic" or "tricky clever hacking".
 
Design patterns provide "increased flexibility and reusability". Often times, however, there is no need for a solution to be very flexible, and by implementing a design pattern, the developer needlessly increases the complexity of the solution. Joshua Kerievsky in his "Refactoring to Patterns" book suggests "discovering" the need to use a design pattern (which is similar to Bob Martin's thesis that re-use should be discovered, not anticipated). Also, the book contains the discussion of refactoring *from* patterns to a more simple solution.

A lot of refactorings that I see in eclipse indeed increase the code size, but more importantly, they decrease its complexity, which is what I'm really after. In the end when we talk about large codebase, we talk about our ability to maintain and extend such a codebase.

In my experience #1 problem in that regard is developers' inability to design and maintain abstractions for a given domain/problem. Code duplication is one of the smells of an anemic domain model, codebase where object model is slightly more than a bag of glorified data structures. Such code bases are brittle, hard to extend and hard to test.

#2 problem is large monolithic applications. A lot of times it's feature bloat - Marry Poppendieck's statistic (if I remember correctly) is 60% of the features in an application are not being used. The problem here oftentimes lies with the whole product development team and not just the developers. Agile teams have the customer (or customer representative) on the team to resolve this very problem.

I think that Individual developer's abilities and the software development process have much larger impact on the codebase complexity than the choice of the programming language.
 
that way of looking at Tetris awed me
 




<< Home
Reg Braithwaite


Recent Writing
Homoiconic Technical Writing / raganwald.posterous.com

Books
What I‘ve Learned From Failure / Kestrels, Quirky Birds, and Hopeless Egocentricity

Share
rewrite_rails / andand / unfold.rb / string_to_proc.rb / dsl_and_let.rb / comprehension.rb / lazy_lists.rb

Beauty
IS-STRICTLY-EQUIVALENT-TO-A / Spaghetti-Western Coding / Golf is a good program spoiled / Programming conventions as signals / Not all functions should be object methods

The Not So Big Software Design / Writing programs for people to read / Why Why Functional Programming Matters Matters / But Y would I want to do a thing like this?

Work
The single most important thing you must do to improve your programming career / The Naïve Approach to Hiring People / No Disrespect / Take control of your interview / Three tips for getting a job through a recruiter / My favourite interview question

Management
Exception Handling in Software Development / What if powerful languages and idioms only work for small teams? / Bricks / Which theory fits the evidence? / Still failing, still learning / What I’ve learned from failure

Notation
The unary ampersand in Ruby / (1..100).inject(&:+) / The challenge of teaching yourself a programming language / The significance of the meta-circular interpreter / Block-Structured Javascript / Haskell, Ruby and Infinity / Closures and Higher-Order Functions

Opinion
Why Apple is more expensive than Amazon / Why we are the biggest obstacles to our own growth / Is software the documentation of business process mistakes? / We have lost control of the apparatus / What I’ve Learned From Sales I, II, III

Whimsey
The Narcissism of Small Code Differences / Billy Martin’s Technique for Managing his Manager / Three stories about The Tao / Programming Language Stories / Why You Need a Degree to Work For BigCo

History
06/04 / 07/04 / 08/04 / 09/04 / 10/04 / 11/04 / 12/04 / 01/05 / 02/05 / 03/05 / 04/05 / 06/05 / 07/05 / 08/05 / 09/05 / 10/05 / 11/05 / 01/06 / 02/06 / 03/06 / 04/06 / 05/06 / 06/06 / 07/06 / 08/06 / 09/06 / 10/06 / 11/06 / 12/06 / 01/07 / 02/07 / 03/07 / 04/07 / 05/07 / 06/07 / 07/07 / 08/07 / 09/07 / 10/07 / 11/07 / 12/07 / 01/08 / 02/08 / 03/08 / 04/08 / 05/08 / 06/08 / 07/08 /