Spaghetti-Western Coding
ATGAGAAAGAGGATGCGAGACACCCCGAACCGCGATCGCCCGCGCGAGAG
ACTGGCAGCACGAGGACCGGAGGCTCTCACCGATGCCGAACTGCTCGCCC
TTCTCCTCGGGCGCGGCACGAAGGGGCGGGACGTCTGGCAGGTCGCGGGC
GACGTCGAACGCTGCCTGAAACGTGCCGAAGGTTGCCCTTCTTATGACGA
TCTCCTCGGAATAGACGGGGTCGGGTCGGCGAAAGCCTGCGAGATCATGG
CCTGCTTCGAACTCGGCCGGAGATACTTCGGGGACGACGGGGTCTCCGGG
CACCGGATCGCCCGCCCCGAGGACGTGCTCCCCCTGGTCACGGAATGGCG
GGACAAGAAGCAGGAGTACTTCTTCTGCATCACCTTAAACGGTGCCGGCG
CTGTGATCGAGCGGCGGATCGTCACCGTCGGAATCCTGAACCAGAGCCTC
GTCCACCCCCGGGAGGTCTTTTCCGATGCGATCACCGACCGTGCGGCCTC
GGTCATCCTGGTCCATAACCATCCTTCGGGCACGCTCGAGCCGTCCGCCC
AGGATCTCGCCATCACCCGGCAGCTCGTCGAGGCCGGATCGATCCTCGGC
ATCCGGGTGCTCGACCACATCATCGTCACGAAGAACGGCTGCGCGAGCTT
AAAAGAACTCGGGCACCTGTAA
In case you didn’t recognise it at a glance, this is a metaprogramming gene: it
repairs DNA in
Methanoculleus Marisnigri, an anaerobic organism found in decaying sediment at the bottom of the Black Sea. Metaprogramming genes‽ Yes indeed, the
Genetic code embraces metaprogramming with enthusiasm: DNA splices itself, moves code around, and rewrites code. Sections of code are turned on and off. Genes manufacture proteins that express or suppress other genes. And I am grossly oversimplifying things!
The frustrating thing about genetic code is that it is really, really easy to understand what it does in the small.
Amino Nucleic acids come in triplets called codons. In the gene above, the first codon and last codon are beginning and end markers, like curly braces in Algol-derived programming languages. The other codons code for amino acids. The entire template manufactures a small piece of RNA. This mechanism is very well understood at this scale of behaviour. However, when you build from the DNA up to the whole organism, the overall behaviour is a mystery, just as
the behaviour of snow is very difficult to predict from the molecular properties of water and air.
Software programs exhibit the same frustrating property: although we can easily determine what individual expressions do, the behaviour of programs in the aggregate resists simple analysis. Consider:
p[++i] = 0;
What does this do? Well, in C++ anything is possible, but a simple interpretation—incrementing an integer and then using the result to assign zero to an element of an indexed collection—is well understood and the probable meaning of this program in many languages.
The program that includes such a line has state, and this line of code makes a very small alteration to that state. What effect does this line of code have on a program’s behaviour? Well, now we have a conundrum. You often find this code inside of a loop. But if we make some error in our loop boundary checking, we can start overwriting some other thing—like our program’s code—with zeroes.
This is the basic problem with state machines: a small change of state can have unpredictable consequences on the behaviour of our machine.
We have a blueprint for a state machine that writes a blueprint for another state machine. So we are looking at a blueprint of a state machine that makes blueprints and we are trying to predict the behaviour of the state machines it builds.
This is roughly like looking at Bach’s DNA and hearing the Little Fugue in your head.
There is a much more familiar expression we use to describe programming machines with mutable state: We call it “Imperative Programming.” And we have discovered that it matches the way our brains think about many algorithms really well, that’s why we (statistically speaking) resist purely functional approaches. However, we have known for a very long time that imperative programs have this awful property: a small change of code can result in unpredictable changes to the behaviour of the running program.
So here is my analogy: an imperative program is a blueprint for a state machine. We put the blueprint inside of another machine we understand really well, and it manufactures the state machine for us, then starts the state machine. Our problem is that we are supposed to understand the behaviour of the constructed machine as it changes state just by looking at the blueprint.
And if you think that is hard…Wait! Let’s make the problem
harder. What if we have a blueprint for a state machine. What does our state machine do? It
writes a blueprint for a state machine and feeds that to a machine-making machine, which then gives us back another state machine. So we are looking at a blueprint of a state machine that makes blueprints and we are trying to predict not only its behaviour but the behaviour of the state machines it builds.
This is roughly like looking at Bach’s DNA and hearing the Little Fugue in your head.
Does that sound easy? No? Well what do you think we are doing when we write code what writes code? Here is a standard Ruby template for writing modules:
module Example
module ClassMethods
#...
end
module InstanceMethods
#...
end
def self.included(receiver)
receiver.extend ClassMethods
receiver.send :include, InstanceMethods
end
end
You are looking at a program that write programs, a blueprint for a machine that makes blueprints. Metaprogramming is imperatively programming an imperative program.
1 Now, a program is a kind of amplifier: a short program produces a big result. So when we write a program that writes programs, we are amplifying the amplification.
The spark of a match. A long, lingering draw on a cheroot. And a narrowing of eyes.Consequently, metaprogramming amplifies the
good, the
bad, and the
ugly of imperative programming.
In and of itself, metaprogramming is ‘just’ an amplifier. If you have issues with imperative programming, you are going to have issues with metaprogramming, only magnified. I don’t know about you, but
I certainly have issues with imperative programming. And I am not alone: the history of programming languages has been dominated by attempts to tame the basic complexity of mutable state in programs.
Structured programming attempted to tame the arbitrarily complicated program flows—known to this day as “spaghetti code”—resulting from the indiscriminate use of
if and
goto statements. Object-oriented programming attempted to tame the arbitrarily complicated data states resulting from the indiscriminate combination of procedures and data structures. Procedure calls, continuations, methods, … the idioms and language mechanisms we’ve devised are all attempts to make simpler machines that are easier to understand.
The Art of the Metaobject Protocol is one of the most important books ever written on the subject of building abstractions with objects. The Metaobject Protocol is a system for defining your own object semantics, and you can implement the parts you choose in Ruby or any other reasonably expressive language. And while you’re learning how to make your programs better in your language, you just might pick up a little Common Lisp. Highly recommended.
Of course, there is no simple answer. Banning metaprogramming makes exactly as much sense as banning abstractions in imperative programming, and for exactly the same reasons. If you ban abstractions from imperative programs, you assert that longer programs made out of simpler pieces are more tractable than short programs made out of abstractions. Sometimes they are, sometimes the abstractions are gratuitous, and
sometimes the mechanisms used to make a program short are incidental to its purpose, almost as if you compressed the source code and insisted that it must be easier to read because it has fewer redundant bits.
I’m personally optimistic that we will find reasonable idioms and mechanisms for managing the complexity of metaprogramming. Some of the idioms we’ve already found are promising, and as long as we don’t succumb to Blub and decide that there is no further improvement possible, we can only get better.
We seem to be doing a decent job with the style where we place our metaprogramming in a separate place from our program’s core “business logic.” Rails accomplishes this with plug-ins: not only can you use lots of other people’s plug-ins, you can deliberately place your own metaprogramming code in a plug-in to keep it separate from the application code that calls it. You call
acts_as_versioned in your class, and the plug-in does the rest. Although you are
using metaprogramming, your program fakes a declarative style quite effectively.
Blueprint TransformersWhen I said that
IS-A IS-A HAS-A, a few people pointed out that IS-A isn’t HAS-A: In actuality, ISA-A
WAS-A HAS-A. Isn’t that the point of true
abstractions? To provide the illusion of a new set of semantics implemented using existing semantics?
proof("Proof of Relationship").where {
document("Affidavit of Same-Gender Domestic Partnership").where {
Both_names = item("Participant's Name") &
item("Partner's Name")
}
}
proof("Proof of Joint Ownership").where {
document("Property Statement").of_kinds(
"Rental/Lease Agreement",
"Deed",
"Property Tax Statement"
)
document("Financial Statements").of_kinds(
"Mortgage Statement",
"Credit Card Statements",
"Bank Statements",
"Utility Bill Statements"
)
}
Syntactically this is Ruby code, but semantically it is something else altogether (If it matters to you, it is a small snippet of a system for defining a decision tree used to determine loan eligibility). This is possible with metaprogramming by its very nature: You are working with a machine that takes a blueprint as its input and produces a blueprint as its output. In effect, you are working with a blueprint transformer. Mediating between different semantics is an obvious application for this technique.
This is great, and I am a big booster of the power of metaprogramming. But just as we are loathe to “throw out the baby with the bath water” by banning techniques just because they can be misused, we should likewise be loathe to embrace techniques as golden hammers just because they have some extremely beneficial applications.
For all of its wonderful utility, metaprogramming is, at its heart, the act of writing programs that write programs. Which means that understanding our programs requires predicting what a program will do given a blueprint for a machine that will write a blueprint for our program.
This is not a trivial exercise. And yes, you can shoot yourself in the foot if you aren’t careful. But in the end, there’s two kinds of programmers in the world: those with dangerous techniques and those who dig.
This post was conceived during the Q&A after Andrew Hessel’s talk at SciBarCamp.
1. (A few people missed this post script, so I have made it an end note):
Yes, I realize that “metaprogramming” is a vague term with may possible interpretations right down to the weak assertion that writing named procedures is metaprogramming. And I also realize that there is much, much more to programming than imperative programming.
But please don’t allow the presence of this disclaimer to discourage you from comments about other forms of metaprogramming that do not fit into the neat box of state machines building blueprints for state machines. I am especially interested in hearing from functional programmers: what is the functional equivalent of the metaprogramming described here and how does it differ in character?
[back]