Ruminations about the performance of anonymous functions in naive Javascript implementations

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Sunday, August 26, 2007

Block-Structured Javascript (better known as the Module Idiom) looks like this:


 (function () {
    var something_or_other;
    // code elided
    return something_or_other;
 })()

This creates a new, anonymous function with its own local scope. Whenever this code is execututed, the interpreter creates a function record in its memory. The exact same thing happens if you create a function and bind it to a variable with var foo = function (...) { ... };.

The Seasoned Schemer is devoted to the myriad uses of first class functions. Luckily for us, the ideas in this provocative book map directly to Javascript (see the plug for Lisp in Small Pieces below).

When you close the back cover you will be able to compose programs from functions in powerful new ways, and you can use these new techniques in Scheme, Ruby, and Javascript immediately.

Now let’s consider another common pattern, the Inner Function: we have a function, and the function needs a helper function. We define the helper function inside our function to make our code more encapsulated:


 var factorial = function (n) {
     var factorial_acc = function (acc, m) {
         if (0 == m) {
             return acc;
         } else {
             return factorial_acc(m * acc, m - 1);
         }
     };
     return factorial_acc(1, n);
 };

What happens when we invoke factorial it six times?

When the interpreter first encounters the code defining factorial, it creates a function and assigns it to the variable factorial. Then each time we invoke the factorial function, the interpreter creates a new function record for factorial_acc. So in total, the interpreter creates seven functions in memory, not two.

hand-rolling

If this code needed hand optimization, you might want to consider ‘lifting’ the definition of factorial_acc outside of factorial, so it doesn’t get recreated with every invocation:


 var factorial_acc = function (acc, m) {
     if (0 == m) {
         return acc;
     } else {
         return factorial_acc(m * acc, m - 1);
     }
 };
 var factorial = function (n) {
     return factorial_acc(1, n);
 };

This produces exactly the same result as our Inner Function version. factorial_acc doesn’t use any of factorial’s parameters or variables, so it does not really need to be inside its scope to produce the correct result.

Now you only need two function records, not seven. Two is cheaper than seven. The problem with this approach is that you are proliferating names. If you are binding functions to names in the global environment, it quickly becomes crowded. And you also have a readability issue. Does anything else need to use factorial_acc? The original code made it very obvious that factorial_acc is only ever used by factorial.

A block can help. Yes, the cause of our performance consideration—dynamically creating functions—can actually be part of the solution:


 var factorial = (function () {
     var factorial_acc = function (acc, m) {
         if (0 == m) {
             return acc;
         } else {
             return factorial_acc(m * acc, m - 1);
         }
     };
     return function (n) {
         return factorial_acc(1, n);
     }
 })();

Now what happens? Well, we create an anonymous function for our block. One function record. Within that block’s execution, we create two more functions, one assigned to the variable factorial_acc, and one returned from the block (and then assigned to the variable factorial). This code creates three function records, which is still much better than seven.

As a correspondent summarized in email, “we’ve shown how to replace a simple function containing an inner function with a block call that returns a closure referencing the inner function so as to avoid re-defining it on each call. That’s all there is to it.”

(By the way, Douglas Crockford has done a very good job of explaining this idiom in Javascript, and named it the Module Pattern. Here’s a discussion with particular emphasis on OO-style programming. And here’s a really detailed examination from the YUI team.)

So should you always rewrite inner functions to use a block like this?

I don’t personally fool around with this kind of hand optimization willy-nilly (Of course, you may find the block version more readable than the inner function version. If you do, it’s a win to write it that way). It has a cost: in a more complex function, defining helpers outside of the function may be moving them further away from where they are used, which is a loss for readibility. If you prefer the inner function version, you should be very sure you have a performance problem before you leap to the conclusion that you should rewrite it.

a heuristic for automatic optimization of inner functions and blocks

Lisp implementations have been optimizing this kind of code, automatically, for decades. That’s because Lisp programmers have been writing programs in this style for decades, either directly or using macros like let. Here’s the basic heuristic:

Lisp in Small Pieces is one of the most important books about Javascript ever written. WTF!? may be your first thought. Hold on. Javascript at its heart, a very Lisp-like language with C syntax. So understanding Lisp helps you understand Javascript.

What makes Lisp in Small Pieces special for Javascript programmers is that it illustrates the principles underlying Lisp (and therefore Javascript) by creating a series of implementations, each of which illustrates the basic mechanisms in the language.

These deep ideas are exactly the things that make Javascript different from other C-syntax languages like Java or Visual Basic. This book, more than any other, will take your understanding from knowing what works on the surface to understanding why and how it works.

Find every function invocation that is nested inside another function.

Analyze all of its variable references lexically. If there are no references to a parameter or local variable of its immediate parent,¹ promote the function by defining it in its grand-parent and assigning the definition to a new variable in the grand-parent.

Replace the original definition in the parent with a reference to the variable you just created in the grand-parent.

Repeat until all functions are either global or refer to variables in their immediate parent functions.

There’s also a well-know optimization for making blocks themselves free or nearly free: lambda lifting. So before optimizing things prematurely, test your implementation and see if it is already fast enough for your purposes.

You may discover that you don’t save anything by rewriting things yourself. (You may make your code slower: some optimizations rely on knowing the exact scope of the code being optimized. If you proliferate names by lifting things yourself, the optimizer may not be able to use all of its tricks.)

These techniques have been known for twenty-five years. If a Javascript implementation that you are forced to target doesn’t include it, why not demand that the implementers get with the program and, you know, use some of the stuff we’ve known about programming for almost as long as they’ve been alive? Especially if they brag about their prowess at creating programming languages?

conclusion: nice-to-know, but not essential

My personal conclusion is that the behaviour of a naïve implementation is a “nice-to-know.” I don’t personally worry about optimizing it until I have a known performance issue, at which point it is essential to test to see whether some of the hand-optimizations will actually help.

YMMV.

1. Full closures make things tricky:

Functions that refer to variables in their immediate parent scope are much trickier to optimize away. Sometimes, such a function is supposed to be created anew for each invocation of its parent. For example, if you want to construct a bank balance thingy without using objects, you might write:


    function (balance) {
      return function (amount) {
          balance = balance + amount;
          return balance;
        };
    };

You pass in an initial balance, and it gives you a single function that you can use to deposit (pass a positive number), withdraw (pass a negative number), or check the balance (pass zero). It returns the updated balance in each case.

The inner anonymous function cannot be lifted or optimized away because of its reference to balance in its parent and because the function can be shown to “escape” its parent.

Whereas in this contrived example:


    function (balance, owner) {
      return function (amount) {
          return {
              new_balance: (function () {
                  if (balance + amount >= 0)
                      return balance + amount;
                  else return balance;
              })(),
              account_owner: owner
          }
        };
    };

Although the inner function still cannot be optimized away, the block within it can be lifted into the inner function and removed, producing:


    function (balance, owner) {
      return function (amount) {
          var __temp;
          if (balance + amount >= 0)
              __temp = balance + amount;
          else __temp = balance; 
          return {
              new_balance: __temp,
              account_owner: owner
          }
        };
    };

Labels: lispy, popular