The Show Must Go On: Query Languages
This is my first really technical post since coming to grips with the end of my side project. It isn’t my best writing about programming by a long shot, but I wanted to remind myself—and you if you need reminding—that no matter what happens with the business of software, the programming of software is still damn interesting.Here is some JavaScript flavoured with
Prototype goodness:
$$('a').select(
function (anchor) {
return anchor.href && !anchor.onclick &&
anchor.href.match(/^https?:\/\/(?!(.*bigco\.com.*)|(localhost.*)$).*$/i);
}
).each (
function (anchor) {
external_url = anchor.href;
anchor.onclick = function () {
doSomethingWhenUserClicksOffSiteLink(external_url);
};
anchor.href='#';
}
);
This code finds all of the anchors (
<a href="...">...</a>
) that lead off the BigCo site and changes them to have an
onclick
handler. You might use something like this to log outgoing clicks so you can judge the popularity of links, for example.
The first thing you might notice is the
Smalltalk-style methods—
select
and
each
—for handling collections. If you like Smalltalk (or Ruby) style collection methods, Prototype gives them to you. If you haven’t seen this kind of JavaScript before, the thing that will probably stand out is the use of anonymous functions as parameters for
select
and
each
.
The thing about this code that grabbed my attention is the use of two different embedded languages. If you aren’t familiar with Prototype,
$$
is a function that takes a
CSS Selector as an argument and returns an array of elements matching the selector. In this case, we’re simply asking for every anchor (“a”). But we could ask for every anchor inside of a specific
div
, or every anchor of a specific class, or almost anything else you might want to use.
CSS selectors are a kind of
query language specifically optimized for applying styles to DOM elements in web pages. But Prototype (and several other JavaScript libraries like
Mootools) make CSS Selectors a general purpose tool. The
Groovy Language provides a similar feature called
GPath, a means of querying object networks using a language very similar to
XPath.
(I personally think GPath is the most interesting thing about Groovy—having an object query language “baked in” changes the way you think about writing programs, much as SQL changes the way you think about using databases. Or it would if you were brought up on non-relational databases, or if you have been eating of the
ORM fruit that leads to
madness.)
Regular expressions allow you to code complex and subtle text processing that you never imagined could be automated. They can be used to craft elegant solutions to a wide range of problems. Once you've mastered regular expressions, they'll become an invaluable part of your toolkit. You will wonder how you ever got by without them. Mastering Regular Expressions is the book to get you there.
Any ways,
$$('a')
really changes everything. It’s not such a big whup in this particular snippet of code, but think about the separation of concerns here: if you do want to change the elements you are manipulating you change one place. You don’t have to restructure a pile of loops and conditionals.
Which leads to the
other embedded language, a rather popular one known as a
Regular Expression:
/^https?:\/\/(?!(.*bigco\.com.*)|(localhost.*)$).*$/i
. Regular expressions are inscrutable to most programmers: if you don’t use them on a daily basis, you are in danger of losing the knack of writing them. And I’m not sure if anyone has the knack of reading them. What does
/^1?$|^(11+?)\1+$/'
match, and why is it famous?
That being said, as inscrutable as they may be, it is very powerful to wrap all your string matching up into a single blob where you can put it in one place. Having loops and scanners and parsers breaking URLs up and looking at individual pieces obscures the intent of your code.
He had a hat! (
John Gruber)
Having had a taste of embedded query languages, I’m left with a hunger for more. Quite honestly, I may be thinking of solutions in want of problems, or perhaps it is effete aesthetics, but the more I think about it, the more the following questions bug me:
Why aren’t patterns
first class values? Obviously, you can assign them to variables and return them from functions. But can I take them apart? Can I compose them? Is there a query language for extracting pieces of a query? Queries (be they regular expressions, XPath, or CSS Selectors) ought to be structures that can be manipulated just like the DOM or like object networks.
Why don’t we have better support for transforming structures with embedded languages? Regular expressions lead the way here: there’s a powerful feature for using a regular expression to take a string apart and put it back together in a different order, or with new bits added to it. And SQL is fully integrated, there’s a natural syntax for updates integrated with queries.
Where’s the same facility for object networks? Right now I can use CSS selectors from Prototype to find elements and builders from
Scriptaculous to modify the DOM. Or I can use
JQuery (thanks, David) to do all my DOM manipulation in one go. That’s terrific, it’s
even better than the snippet above. But...
Why can’t I use the same power for transforming JSON or my own objects? It’s like regular expressions: once you taste the power with strings, you want to use them with arrays. Once you use XPath with XML, you want to use GPath with graphs of objects.
Once I get going, my mind immediately jumps a level from things that would be useful to things that would be cool for the sake of being cool (and nothing else): With OO, we’re hung up on
messages. But those messages are ridiculously primitive! A verb and a bunch of parameters that are usually nouns. Imagine a meta-language where each receiver could interpret
its own language. So strings would interpret regular expressions, and DOMs would interpret XPath and/or CSS Selectors. What we call a type or an interface today—a set of verbs with rules for the accompanying parameters—would be replaced by a set of languages receivers understand.
I have some vacation time coming in a few months. Now I have something to think about!
Now, I have asked a lot of “why can’t I…” questions. I hope I get a lot of comments saying “You can, check out the KulTulz library for JavaScript, or the Mazenblitz Macro Package for Common Lisp, or even RTFM about Ruby…” What libraries, languages, and packages provide the kind of features I’m daydreaming about?