Anything "Ridiculously Easy" is going to attract some Ridicule
Recently, Lucas Carlson
announced Starfish, an ultra-lightweight distributed processing framework, and map_reduce, his own implementation of distributed mapping and reducing based loosely on Google's famous
MapReduce.
Almost immediately, people pointed out that what he had created looked to them like a toy. Some people pointed out that although its map is fully distributed, its reduce is
centralized in the supervisor process. Others pointed out that fault tolerance was not built-in. Some even pointed out that it looked like a thin wrapper around other services (as if free software is
sold by the pound).
I agree that map_reduce is not MapReduce. That's a
good thing. After all, the world already
has MapReduce. If you want to use MapReduce just the way it is, go work for Google. Don't wait.
You know, all the comparison to MapReduce has a strangely familiar ring. What do the following systems have in common?
Minicomputers, microcomputers, personal computers, laptop computers, 5 1/4" disc drives, 3 1/2" disc drives, 1.8" disc drives, client-server computing, PC databases, Unix, C, Ruby, Java, television, colour television, automatic transmissions, iPods, ...
Must I go on? History is replete with inventions that are simplified, scaled-down versions of things that have come before (I know, Java-haters will find it hard to remember a time when Java was the new kid on the block that represented a simplified, scaled-down language compared to C++). It seems that every time some such thing comes out, somebody points out that it is a toy, not suitable for serious use.
My personal favourite example of dismissing advancement is the television. When it came out, old time radio people dismissed it as a toy. Nice call. But wait, we aren't done. When colour television came out, the black and white television people dismissed it as unneeded. Somebody, as they say, wasn't dancing with the innovation they brought to the ball.
And what happens? Come on, you
know where this is going, it's practically a cliché: first the new, simpler, less powerful thing lives in a weird niche where people have a special need that overrides the bountiful impracticalities of the new thing. Then whole new markets are discovered where the new thing offers the perfect balance of features and before long, the new thing takes over the old markets.
It's pretty obvious to me that when a lot of people dismiss something as being too simple, too underpowered, and lacking the wide variety of features and options of its predecessors, the right thing to do is to take a closer look and suspend final judgment. Right now
there's a world market for maybe five full-text web search engines. If you are one of those five people trying to index the entire web, you can dismiss map_reduce immediately.
Everyone else might want to look at map_reduce (and everything else considered too wimpy for serious work) and instead of listing all the ways it falls short of the status quo, ask yourself in what ways does the status quo falls short of mass-market appeal.
At first glance, map_reduce looks like it makes it really easy to distribute analysis, especially of things living in your database. Hmmm. Thousands of Rails users put things in one database. Will it scale to 2,000 systems? How many of you have 2,000 systems? Next question.
Now how many Rails programmers just ordered a shiny new Mac Pro with four cores? Nice to see a sea of hands. Guess what? You are all people who could benefit from map_reduce
right now. Do you have a few spare Macs or PCs in your office? All the better, put them to work while you're at it.
I'm not in a position to recommend using map_reduce until I've tried it myself. But I can say without hesitation that there is a need for ridiculously easy distributed processing on Ruby, and it doesn't need to scale to 2,000 machines to be useful.
Update: Lucas posted a working example from a production system: How I sent emails 10x faster than before. Updated again to link to his explanation for how reduce works in map_reduce.
Hot news!"
How many of you have 2,000 systems?" The answer is:
all of you. Amazon's
Elastic Compute Cloud lets you run applications on thousands of machines and pay only for compute time and bandwidth outside of the cloud. Note to my sharp-witted readers (again, all of you): this is not a license to write and say "because I
might run an application on 2,000 servers, I'm dismissing Starfish without another thought." The correct thing to write is "I
have written an application that runs on 2,000 servers, and..."
Labels: ruby