Block-Structured Javascript

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Monday, August 20, 2007

Javascript provides closures with first-class access to variables declared in the enclosing environment. Besides being a handy piece of trivia if you are ever playing Programming Jeopardy, what use is this to the actual working programmer?

There are a lot of ways to take advantage of Javascript’s closures, I am going to describe just one, replicating Algol’s block structure (or Lisp’s let and begin macros, if you prefer). When we’re all done you’ll have a handy tool for making your code more readable, separating its concerns, and generally making life easier for programmers who have to read what you’ve written.

From Bricks to Blocks

A block is a chunk of code inside a function. Blocks have well-defined entry and exit points. Blocks have their own local variables and functions, and they also have first-class access to variables and functions defined around them. Blocks may nest.

Structuring code into blocks makes large functions more readable and easier to refactor. All of the variables and logic needed for one thing are encapsulated together in the blocks where they are needed, not scattered about in functions everywhere.

when a better design—if unfamiliar—is shown to developers or experienced users, they tend to reject it. Often, it takes careful explanation and having them gain experience with it before the improvement is understood.

—Jef Raskin

While the idiom may be slightly unfamiliar to some, I think you’ll agree that it is highly accessible and not some sort of über-contruct of interest only to the self-professed hacker. And when a programmer encounters it in your code, she will have no trouble figuring out what it does and how it does it.

block structured code

I think everyone can agree that structured programming is a good thing. Functions should be composed of blocks, with the blocks linked together by constructs such as if statements:


if (some_condition) {
    // a block
}
else {
    // another block
}

Back before structured programming was introduced, there were gotos everywhere. Looking at a piece of code with labels, it was hard to know what the flow of control might be at run time. In short, you never had the confidence that everything you needed to know about that code was right there next to that code.

Like almost all modern languages, Javascript’s blocks do structure the flow of control so that you have nice clean entries and exits from each block. And also like most other modern languages, Javascript does nothing to structure access to variables inside blocks. For example:


var j = 0;
if (some_condition) {
    // a block with some code elided
}
alert(j);

What will be shown? You can’t guarantee that zero will pop up, because the blocks might modify j, like this:


var j = 0;
if (true) {
    var j = 42;
}
alert(j);

This looks like a programmer error. After all, the var keyword should be used to declare a new variable. The code between the braces isn’t really a new block of code with its own variables. This is hardly a problem with trivial examples. But if you start building some larger functions, the possibility of accidentally overwriting some code looms larger. Especially once you start refactoring and moving blocks of code around.

What can we do? How about this: the “block” idiom (you can also call it let or begin if you want to sound like a Schemer). Here’s the code:


var j = 0;
if (true) 
    (function () {
        var j = 42;
    })();
alert(j);

There’s a lot of syntactic noise, what does it mean? In short, we said: “Create a new function. In the body of the new function define a variable called j and assign 42 to it. Then call that function without any parameters.” Because our new instance of j is inside a function, it is not the same variable as the j outside of the function. That can be handy.

Are there any other benefits of this idiom? Yes indeed. Sometimes you have an assignment and you need some logic on the right hand side. How do you write:


var proven = {
    var n = Math.round(100*Math.random());
    var total = 0;
    for (var i = 0; i <= n; ++i) {
        total = total + (2*i) + 1
    }
    return total == ((n + 1) * (n+1));
};
alert(proven);

You can’t, of course. There are two problems with trying to use braces in this case. First, Javascript only allows braces to form code blocks in conjunction with specific keywords like if and function. Second, Javascript code blocks are not expressions—they do not produce values. This is why languages like Javascript need an if statement and a ternary operator: if blocks produced values, you would only need if expressions.

So in traditional Javascript style, you have to define a function somewhere else and call it… You’ll notice our block idiom includes defining and calling a function. What if our function returns a value? In that case, we can use a block anywhere we want a value, for example:


var proven = (function () {
    var n = Math.round(100*Math.random());
    var total = 0;
    for (var i = 0; i <= n; ++i) {
        total = total + (2*i) + 1
    }
    return total == ((n + 1) * (n+1));
})();
alert(proven);

This new idiom allows us to make first-class blocks anywhere we like. Our blocks are expressions, and we can use them anywhere we need a value. And as above, Our variables are fully encapsulated, they do not overwrite variables defined elsewhere.

blocks vs. named functions

You may be wondering, "Why can’t we use a named function?" This is the style in languages like Python, where the Benevolent Dictator does not permit constructions like this. Here is the above code using a named function:


var proven_helper = function () {
    var n = Math.round(100*Math.random());
    var n_plus_1_squared = function (n) {
        return (n + 1) * (n+1); 
    };
    var sum = function (n) {
        var total = 0;
        for (var i = 0; i <= n; ++i) {
            total = total + (2*i) + 1
        }
        return total;
    };
    return sum(n) == n_plus_1_squared(n);
};
var proven = proven_helper();
alert(proven);

I find this almost as good as the block. Since you only use it in one place, it is defined where you use it. That is good. And the name might be helpful documentation, just like a one or two-word comment. Balanced against this is the fact that you have added a new function to the outer scope. Reading it later, you might have to scan the rest of the code to see if it is used elsewhere.

There's also a very small advantage of the block over the named function: since you need two statements (one to name a function, another to use it), you can only use a named function in normal code blocks. You cannot use a named function when you need an expression, unless you resign yourself to naming the function in one place and using it somewhere else.

For example, when constructing array or hash literals, you can use expressions. A block is an expression, while two statements (one to create a named function and one to call it) are not an expression. So a named function would need to be defined outside of an array or hash literal, while a block can be used inside it, placing the code closer to where it is used.

block structure and cleaner code

JavaScript: The Definitive Guide takes the time to actually discuss the language, to explain what Javascript can do and how to do it. And of course, the book also provides an in-depth reference of every function and object you are likely to encounter in most implementations. Recommended.

Structuring code into blocks makes large functions more readable and easier to refactor. All of the variables and logic needed for one thing are encapsulated together in the blocks where they are needed, not scattered about in functions everywhere. If you see a variable declared inside a block, you know it is only used inside the block. If you see a variable with the same name outside the block—a regrettable occurrence—you know that moving or changing the block will not affect the code working with variables outside of the block.

You probably know that you can put a function inside of a function in Javascript:


var factorial = function (n) {
    var factorial_acc = function (acc, n) {
        if (0 == n) {
            return acc;
        } else {
            return factorial_acc(n * acc, n - 1);
        }
    };
    return factorial_acc(1, n);
}
alert(factorial(6));

And this is a good thing, it keeps the function factorial_acc inside of factorial. Since that’s the only place you need it, why declare it anywhere else? The fact that you can put a function inside of a function implies that you can put a function inside of a block as well:


var proven = (function () {
    var n = Math.round(100*Math.random());
    var n_plus_1_squared = function (n) {
        return (n + 1) * (n+1); 
    };
    var sum = function (n) {
        var total = 0;
        for (var i = 0; i <= n; ++i) {
            total = total + (2*i) + 1
        }
        return total;
    };
    return sum(n) == n_plus_1_squared(n);
})();
alert(proven);

If you only need the functions n_plus_1_squared and sum to do this one job, in this one place, why should they be defined at top level cluttering up your code? Why force other programmers to search through your code figuring out where they are used before making changes?

Block structure may seem unfamilar at first, but give blocks a try and see whether you start finding the code even easier to read and refactor with blocks. Like me, you will find that structuring your code with blocks puts the things you use right where you use them.

update: Ruminations about the performance of anonymous functions in naive Javascript implementations.

Labels: lispy, popular

¶ 10:08 PM

Comments on “Block-Structured Javascript”:

Why not move the code into a named function instead of putting it into an anonymous one? Apart from making it still easier to read and refactor, the function name would act as a kind of documentation.

# posted by

Siddharta : 11:35 PM

Siddharta:

I shall add an example using a named function, you can compare and contrast for yourself.

As you can tell, I place a very high value on putting code and variables that I only use once as close as possible to where I use them.

# posted by

Reginald Braithwaite : 7:04 AM

You ask some rhetorical questions about why one would define functions outside of where they are used. Sometimes, at least, the answer is to avoid creating garbage. Each time the code that defines a function gets executed it creates an object (of type Function). Each time it goes out of scope, it becomes garbage that has to be collected.

In the context of the browser, this is rarely a problem, though a periodic timer or a mouse move event handler could run many times in a short period of time. In the context of other environments running JS code (e.g. Rhino on the JVM, ActionScript in Flash), the chance that you'll wind up doing something like that in a tight loop accidentally goes up.

Granted, I'm talking about an optimization, and we all know that premature optimization is nothing but trouble. That said, it's important to be aware of the performance pitfall inherent in the idiom.

# posted by

Gregory : 9:22 AM

Gregory:

Obviously, good comment. And to be honest, if someone proudly demonstrated this code to me in an interview I would ask if they understood the costs.

That being said...

This is easily one of the oldest debates in programming. My experience to date has been this: If an unperformant idiom becomes popular, compilers will adjust to optimize it.

You saw this with OOP... are you old enough to remember people complaining about the overhead of virtual functions in C++? You saw this with garbage collection (and it is now faster than manual memory management in most cases).

So, good code that is unperformant will get faster, but bad code is forever. So... I generally ignore performance until I find my own teeth gnashing as I wait for the computer :-)

p.s. Of course, unperformant bad code is really bad. I may be guilty of same.

# posted by

Reginald Braithwaite : 9:28 AM

Two thoughts: the first is that this page is relevant; the second is that it's probably pretty straightforward to graft block syntax onto JavaScript using Prototype. Since these things can be patterned as vars holding functions, it's probably pretty straightforward to encapsulate that behavior. Also, that reminds me, this page is relevant too.

# posted by

Giles Bowkett : 2:36 PM

The point about the code being near the place where it is used is a good one.

From a documentation and readability point of view I find that the named function is easier because its possible to read the block by the function name without having to read the implementation.

I would agree that when the block is small or straightforward then it makes good sense to use an anonymous function.

# posted by

Siddharta : 3:40 PM

If you use 'function' to define your inner functions (instead of 'var'), you can use an "inverted pyramid" style of writing, where the abstract or executive summary goes first, followed by the details.

var proven = function() {
// high-level view:
var n = ...;
var n_plus_1_squared = ...;
return sum(n) == n_plus_1_squared(n);
// details:
function sum(n) {...}
}

This works because ECMAScript specifies that function bodies are processed for function definitions first (ECMA-262 pp. 39-40 and pp. 71-72). I wouldn't know it if I hadn't had to work around some nasty scoping issues while implementing an ECMAScript compiler, but now that I get to write ECMAScript instead of compile it, I can to use that hard-won knowledge... :-)

# posted by

Oliver Steele : 8:13 PM

This was the motivation for 'let' in JavaScript 1.7. See this document for more information. The idea was to provide syntactic sugar for the (function(){})() idiom (but you can return from the middle of a block with 'let' in it).

# posted by

Blake : 5:31 PM

<< Home