raganwald
Tuesday, June 19, 2007
  The Show Must Go On: Query Languages
This is my first really technical post since coming to grips with the end of my side project. It isn’t my best writing about programming by a long shot, but I wanted to remind myself—and you if you need reminding—that no matter what happens with the business of software, the programming of software is still damn interesting.

Here is some JavaScript flavoured with Prototype goodness:

$$('a').select(
function (anchor) {
return anchor.href && !anchor.onclick &&
anchor.href.match(/^https?:\/\/(?!(.*bigco\.com.*)|(localhost.*)$).*$/i);
}
).each (
function (anchor) {
external_url = anchor.href;
anchor.onclick = function () {
doSomethingWhenUserClicksOffSiteLink(external_url);
};
anchor.href='#';
}
);

This code finds all of the anchors (<a href="...">...</a>) that lead off the BigCo site and changes them to have an onclick handler. You might use something like this to log outgoing clicks so you can judge the popularity of links, for example.

The first thing you might notice is the Smalltalk-style methods—select and each—for handling collections. If you like Smalltalk (or Ruby) style collection methods, Prototype gives them to you. If you haven’t seen this kind of JavaScript before, the thing that will probably stand out is the use of anonymous functions as parameters for select and each.

The thing about this code that grabbed my attention is the use of two different embedded languages. If you aren’t familiar with Prototype, $$ is a function that takes a CSS Selector as an argument and returns an array of elements matching the selector. In this case, we’re simply asking for every anchor (“a”). But we could ask for every anchor inside of a specific div, or every anchor of a specific class, or almost anything else you might want to use.

CSS selectors are a kind of query language specifically optimized for applying styles to DOM elements in web pages. But Prototype (and several other JavaScript libraries like Mootools) make CSS Selectors a general purpose tool. The Groovy Language provides a similar feature called GPath, a means of querying object networks using a language very similar to XPath.

(I personally think GPath is the most interesting thing about Groovy—having an object query language “baked in” changes the way you think about writing programs, much as SQL changes the way you think about using databases. Or it would if you were brought up on non-relational databases, or if you have been eating of the ORM fruit that leads to madness.)



Regular expressions allow you to code complex and subtle text processing that you never imagined could be automated. They can be used to craft elegant solutions to a wide range of problems. Once you've mastered regular expressions, they'll become an invaluable part of your toolkit. You will wonder how you ever got by without them. Mastering Regular Expressions is the book to get you there.

Any ways, $$('a') really changes everything. It’s not such a big whup in this particular snippet of code, but think about the separation of concerns here: if you do want to change the elements you are manipulating you change one place. You don’t have to restructure a pile of loops and conditionals.

Which leads to the other embedded language, a rather popular one known as a Regular Expression: /^https?:\/\/(?!(.*bigco\.com.*)|(localhost.*)$).*$/i. Regular expressions are inscrutable to most programmers: if you don’t use them on a daily basis, you are in danger of losing the knack of writing them. And I’m not sure if anyone has the knack of reading them. What does /^1?$|^(11+?)\1+$/' match, and why is it famous?

That being said, as inscrutable as they may be, it is very powerful to wrap all your string matching up into a single blob where you can put it in one place. Having loops and scanners and parsers breaking URLs up and looking at individual pieces obscures the intent of your code.

He had a hat! (John Gruber)

Having had a taste of embedded query languages, I’m left with a hunger for more. Quite honestly, I may be thinking of solutions in want of problems, or perhaps it is effete aesthetics, but the more I think about it, the more the following questions bug me:

Why aren’t patterns first class values? Obviously, you can assign them to variables and return them from functions. But can I take them apart? Can I compose them? Is there a query language for extracting pieces of a query? Queries (be they regular expressions, XPath, or CSS Selectors) ought to be structures that can be manipulated just like the DOM or like object networks.

Why don’t we have better support for transforming structures with embedded languages? Regular expressions lead the way here: there’s a powerful feature for using a regular expression to take a string apart and put it back together in a different order, or with new bits added to it. And SQL is fully integrated, there’s a natural syntax for updates integrated with queries.

Where’s the same facility for object networks? Right now I can use CSS selectors from Prototype to find elements and builders from Scriptaculous to modify the DOM. Or I can use JQuery (thanks, David) to do all my DOM manipulation in one go. That’s terrific, it’s even better than the snippet above. But...

Why can’t I use the same power for transforming JSON or my own objects? It’s like regular expressions: once you taste the power with strings, you want to use them with arrays. Once you use XPath with XML, you want to use GPath with graphs of objects.

Once I get going, my mind immediately jumps a level from things that would be useful to things that would be cool for the sake of being cool (and nothing else): With OO, we’re hung up on messages. But those messages are ridiculously primitive! A verb and a bunch of parameters that are usually nouns. Imagine a meta-language where each receiver could interpret its own language. So strings would interpret regular expressions, and DOMs would interpret XPath and/or CSS Selectors. What we call a type or an interface today—a set of verbs with rules for the accompanying parameters—would be replaced by a set of languages receivers understand.

I have some vacation time coming in a few months. Now I have something to think about!



Now, I have asked a lot of “why can’t I…” questions. I hope I get a lot of comments saying “You can, check out the KulTulz library for JavaScript, or the Mazenblitz Macro Package for Common Lisp, or even RTFM about Ruby…” What libraries, languages, and packages provide the kind of features I’m daydreaming about?
 

Comments on “The Show Must Go On: Query Languages:
Isn't JQuery very close to what you want for javascript?
 
Good call!
 
Looking at GPath, I'm somewhat skeptical. I don't think I like that the same notation is used for accessing a property and mapping that property access over a collection. I still prefer comprehensions. :-)

Maybe I'm missing something though.
 
I'm looking to build into CouchDb a regular expression syntax for JSON.

I hate XML and CouchDb is going to switch to JSON documents instead of XML. I want the built-in query language to support JSON syntax natively and also have a regular expression syntax as well.

I've played with coming up with my own syntax, but it's hard to find anything that looks readable. I could use some help.
 
"So strings would interpret regular expressions, and DOMs would interpret XPath and/or CSS Selectors."

XPath support is baked into DOM level 3 (most DOMs, though, implement level 2). REXML has some XPath support built in, though I don't know how complete it is. The Rails 1.2 HTML DOM has support for CSS selectors, which proves pretty useful for testings (search for "assert_select").
 
What does that reg-exp match and why is it famous? The \1 inside of the matching part is throwing me off. I've tried to google / ask friends and have come up with nothing.

Any hints would be lovely.
 
The regular expression? It's a classic. Here's a hint: search for abigail, regular, and expression.
 
Apache Commons has the JXPath library , which enabled you to apply XPath queries to Java object graphs.
 
Perl hase Class-XPath as well.
 
What about LINQ? SQL, XQUERY-like and object graph query...
 
Only loosely related, but Hpricot might be worth a look for some of the same goodness for Ruby.
 
that regular expression matches numbers that are not prime.

basically instead of a decimal number like '7', you feed it that many 1's.

so 7 becomes '1111111'

then the regex uses groups and lookaheads to try and get a group of 1's to match at the beginning, and then have that same group match multiple times all the way to the end. if it matches evenly, essentially you have just found a factor of the number.

ie for 1111111

first it checks if it's just 1, since 1 isn't prime. there's not just 1 "1" so it's not prime. then it tries to match groups of "11".
so it matches [11][11][11] but then there's still a 1 left on the end so no match. then it tries to match groups of "111" and gets [111][111]1, so no match. then it tries to match groups of 4, gets [1111]111 no match, and so on.

it never finds a group that matches and then matches again cleanly all the way to the end, hence the number has no factors other than 1 and itself (it is prime)

so that explains what it does but i don't know why it's famous. maybe just for being so damn clever.
 
Also consider tuple-spaces (Linda).

Erlangs pattern matching.

Prolog as a pattern matching language.
 
FWIW, take a look at Perl 6 grammars.

(That article is 5 years old; thank goodness we’re finally closing in on an actual implementation. Considering that the entire project has only a handful of core members and none of them have really been able to make it their full time concern…)
 
Messages of verb, object are primitive, (text adventures, anyone?) but since the verb can be 'parse', and the object something that represents a {set,list,bag,tree} of instructions.... That's the interpreter pattern, I think.

I DO wish for more support for writing parsers, however. I seem to be very weak at that, and apart from Bison's parallelism, we don't seem to have got much beyond YACC in the last 20-odd years. That shows how good YACC is, of course, but getting the grammars correct, and enabling them to give useful error messages, still feels like hard work.
 
I completely understand your desire for manipulating queries. I want exactly that for TSQL, so that I can analyze my DB code, determine how the data flow, etc. At some point I'll figure out the TSQL grammar (MS won't provide it).

My company's main product actually does a bit of what you're talking about with the queries it allows the user to generate, in particular, when generating editable pivot-table-style grids. It's pretty sweet, albeit a bit confusing at times.
 
In terms of generating parsers, I think parser combinators like Parsec are a significant improvement over tools like yacc. In particular, they let parsers be first class objects of the language which you can fiddle with at your whim. Dynamic grammar combination is a really nice feature.
 
@David R. MacIver: That sounds interesting, but I'm barely even a beginner in Haskell, and my grasp of monads is similar to my grasp of a wet bar of soap :-) (just when I think I've got hold of the idea it slips...). Have you any pointers to introductory material on Parser Combinators for languages with a less functional style? Hopefully they'll be of use to other readers...
 
Short version: Err, no, not really. :-) I'm far from an expert myself.

There are some reasonably good examples in the parsec source directories. There's JParsec which is a port of Parsec to Java (I've not used it, because I fear how awful such a concept could be. For all I know it's alright, but I suspect it isn't...)

The problem with parser combinators in a less functional style is that combinators are an inherently functional concept. As the great Dr Seuss once said: "A parser for things, is a function from strings, to lists, of pairs, of strings and things". :-) Parser combinators provide primitive parser functions as a starting point and higher order functions for combining simple parsers into more complicated ones.
 
So instead of thinking of a grammar as a graph to be searched you think of it as a set of (mutually recursive) functions yielding lists, and this all get monadic to wrap what most languages would keep as "state", shift/reduce etc?
When I saw "combinator" I was thinking of the Y combinator, which I still don't grok.

Thank you
 
hgs:

I don't really grok combinators in all of their glory either. I feel like a Parrot repeating what it has heard without understanding.

Here's my attempt:

But Y would I want to do a thing like this?
 
hgs: That's essentially right, yes.

Really combinators aren't so scary. Even the Y combinator is positively friendly when you get used to it.

All a combinator is is a higher order function which is useful for building other higher order functions. Looked at the right way, this is nothing more than control flow. if statements, while or for loops, etc. can all be regarded as forms of combinators (blurring the line between 'blocks' and 'functions' here, but if you're a Ruby or Smalltalk fan I'm sure you won't mind that. :-) ).

Once actions and control flow become first class, thinking about this sort of things becomes more prevalent. Hence you can build powerful abstractions where if statements get replaced with Bayes rule, or parser oriented control flow statements. Monads crop up a lot in this because control flow is frequently sequencing oriented (do this thing and feed the results into that thing), which is one of the big abstractions which they're useful at dealing with.
 




<< Home
Reg Braithwaite


Recent Writing
Homoiconic Technical Writing / raganwald.posterous.com

Books
What I‘ve Learned From Failure / Kestrels, Quirky Birds, and Hopeless Egocentricity

Share
rewrite_rails / andand / unfold.rb / string_to_proc.rb / dsl_and_let.rb / comprehension.rb / lazy_lists.rb

Beauty
IS-STRICTLY-EQUIVALENT-TO-A / Spaghetti-Western Coding / Golf is a good program spoiled / Programming conventions as signals / Not all functions should be object methods

The Not So Big Software Design / Writing programs for people to read / Why Why Functional Programming Matters Matters / But Y would I want to do a thing like this?

Work
The single most important thing you must do to improve your programming career / The Naïve Approach to Hiring People / No Disrespect / Take control of your interview / Three tips for getting a job through a recruiter / My favourite interview question

Management
Exception Handling in Software Development / What if powerful languages and idioms only work for small teams? / Bricks / Which theory fits the evidence? / Still failing, still learning / What I’ve learned from failure

Notation
The unary ampersand in Ruby / (1..100).inject(&:+) / The challenge of teaching yourself a programming language / The significance of the meta-circular interpreter / Block-Structured Javascript / Haskell, Ruby and Infinity / Closures and Higher-Order Functions

Opinion
Why Apple is more expensive than Amazon / Why we are the biggest obstacles to our own growth / Is software the documentation of business process mistakes? / We have lost control of the apparatus / What I’ve Learned From Sales I, II, III

Whimsey
The Narcissism of Small Code Differences / Billy Martin’s Technique for Managing his Manager / Three stories about The Tao / Programming Language Stories / Why You Need a Degree to Work For BigCo

History
06/04 / 07/04 / 08/04 / 09/04 / 10/04 / 11/04 / 12/04 / 01/05 / 02/05 / 03/05 / 04/05 / 06/05 / 07/05 / 08/05 / 09/05 / 10/05 / 11/05 / 01/06 / 02/06 / 03/06 / 04/06 / 05/06 / 06/06 / 07/06 / 08/06 / 09/06 / 10/06 / 11/06 / 12/06 / 01/07 / 02/07 / 03/07 / 04/07 / 05/07 / 06/07 / 07/07 / 08/07 / 09/07 / 10/07 / 11/07 / 12/07 / 01/08 / 02/08 / 03/08 / 04/08 / 05/08 / 06/08 / 07/08 /