An Approach to Composing Domain-Specific Languages in Ruby

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Friday, March 16, 2007

Whoa! This looks like a long post with a lot of code snippets. Am I going to have to do a lot of hard thinking, or can I just relax and enjoy a good rambling essay?

This is a bit long, probably (like all my posts) 200% longer than necessary. If you just want to see a neat DSL that implements Haskell and Python’s List Comprehensions written in Ruby, just scroll to the bottom.

If I do bother to read it all, will I learn some neat hacks?

Yes, but you could learn them just as well by reading the source code directly.

So the benefit of reading the whole thing is...?

The List Comprehensions DSL is the what. The source code is the how. But the essay is the why.

Reading the whole thing will take you through some of the pitfalls of writing DSLs and explain why I chose my particular workarounds.

Furthermore, there are a lot of corners in Ruby where you can easily assume that things work one way, but really they don’t. If you actually try the snippets on your computer, you’ll have a much better chance of remembering where the pitfalls are. That’s why I tried to give a working example for every point, rather than just explaining things in words.

Of course, if you have no interest in writing your own Domain Specific Languages in Ruby just yet... this isn’t meant as a popular essay, rather it’s meant as an experience report for fellow practitioners. And honestly, there’s a world market for maybe five tools for writing DSLs in Ruby.

But since you’re here, the essay starts below!

An Approach to Composing Domain-Specific Languages in Ruby

Ruby is often touted as a good language for writing Domain-Specific Languages (“DSLs”). There are a few arguments in favour of writing a DSL as part of an application.

The first argument that comes to mind is that if the application’s domain experts have a specific natural language or jargon of their own, writing a DSL makes it easy for programmers and domain experts to collaborate. While it is rare to find substantial applications entirely written by non-programmers at this time in any language, it is quite feasible for non-programmers to write or validate portions of an application representing its “business rules” or domain logic, while programmers maintain its infrastructure.

    include StarbucksDSL
    order = latte venti, half_caf, non_fat, no_foam, no_whip
    print order.prepare

—Building Domain Specific Languages in Ruby

Another argument in favour of a DSL is that even when non-programmers are not involved directly in coding an application, the programmers themselves often have a jargon of their own to describe entities, algorithms and data structures in the application. Having portions of the application written in a language closely resembling the programmer’s own jargon makes it easy for them to read each other’s work and understand its intent.

Successful examples of DSLs embedded within existing languages and frameworks include Ruby on Rails’ ActiveRecord, where statements such as:

    has_and_belongs_to_many :Bar
    validates_presence_of :blitz
    some_bars = Bar.find_by_tavern_license(license_number)

Are self-documenting to anyone familiar with relational models.

The final argument I’ll repeat here is that a DSL is a very effective way to separate the what from the how of an algorithm. Separation of concerns is a desirable property of good programs, and DSLs provide this separation very clearly. In the ActiveRecord examples above, the exact mechanisms of relating tables, validating records, and performing searches is “abstracted away” from the code where the programmer declares how she would like the results used.

Freedom is Slavery

DSLs can be hacked together quickly in Ruby (whether they can be made sufficiently robust for your production needs may require considerably more care). Hacking a DSL together with little effort is a benefit, especially when prototyping: sometimes the best way to design a DSL is to try to use it, so you can discover what you need to express.

The Ruby Way is the perfect second Ruby book for serious programmers. The Ruby Way contains more than four hundred examples explaining how to do everything from distribute Ruby with Rinda to dynamic programming techniques just like these.

Some developers have raised the concern that extensive use of “magic” features leads to code that cannot be understood or maintained.¹ My own feeling is that DSLs lead to code that is easier to understand, not more difficult to understand. This leaves an argument about maintenance. Some techniques for meta-programming, such as extending core classes like Array, have what you might call “non-local effects.”

For example, two different pieces of code might try to extend the same core class, interfering with each other. Each works in isolation and passes all of its unit tests. But when plugged into a larger application that uses them together, they break.

Lispers are among the best grads of the Sweep-It-Under-Someone-Else’s-Carpet School of Simulated Simplicity.

—Larry Wall

Another problem occurs with extending the Kernel class or creating “top level” methods to be used as verbs in a DSL. You end up with name space crowding: you must be very careful that you do not redefine en existing method.

To fix this problem, the code that implements the DSL needs to be contained so that it does not interfere with other code. We can still implement verbs as methods, but we must implement those methods in separate objects, classes, or modules.

Zen in the Art of Program Maintenance

An established technique for implementing methods in objects is to define the methods and then execute a block of code using instance_eval so that it has access to the object’s methods.

I’m trying to get the Zen of building DSLs using Ruby. After reading a dozen or so pieces referenced by my favourite search engine, I have a feeling I’m still not quite getting it.

—Don Box

You know, code expresses an idea better than words express an idea… when the idea is about coding. Please try this example in irb. Don’t just skim the text and nod: there’s a powerful learning mechanism at work when you physically do things as you’re learning, even if it’s just copying, pasting, comparing the result in one window to the text in another, and so on:

def bjarne
    'Barney'
end

dsl = Object.new
def dsl.phred
    'Fred'
end

plus = ' plus '

print dsl.instance_eval {
    phred + plus + bjarne
}
##### "Fred plus Barney"

What does this show? Well, we have created a way to use a method defined in our dsl object, a local variable plus, and a top-level method bjarne. We can imagine scaling this up to defining a rich DSL in our DSL object and being able to mix verbs from the DSL with instance variables and other methods as we please.

Touching back on the subject of containment, we have defined bjarne in Kernel. Now bjarne is essentially global. If we already defined bjarne somewhere else, we just clobbered it. And if we later run a piece of code that defines bjarne, we’ll clobber our own version. phred is different. It’s defined inside of an object, and it doesn’t conflict with any other phred we define elsewhere.

Great! So… Can we cite a few examples of this technique in action (such as Jamis’ post where he calls phred and bjarne examples of Sandboxing and Top-level methods) and end the post here?

No. The code above looks fine. But there is a hidden problem with this sandboxing technique:

MyDsl = Object.new

def MyDsl.phred
    'Fred'
end

class ClientCode

    def bjarne
        'Barney'
    end

    def friends
        plus = ' plus '
        MyDsl.instance_eval { phred + plus + bjarne }
    end

end

ClientCode.new.friends
##### -:15:in `friends': undefined local variable or method `bjarne' for # (NameError) from -:15:in `friends' from -:20

WTF?! This looks just like our top-level example, but we’ve placed our code inside of a ClientCode method. And bjarne is a method in ClientCode: this way we can continue to separate concerns, keeping phred inside our DSL and bjarne inside of the class where we are using the DSL. But it doesn’t work.

Why instance_eval breaks (in tedious detail)

As you know, everything in Ruby is either a variable or a method (how it figures out the difference is a major irritation). When you invoke a method, you are actually sending a message to a receiver.² Sometimes you name the receiver (some_object.a_method), and there is no ambiguity.

But when you just name the method (like bjarne), Ruby tries to find the method for itself. It does so by looking to see whether it is an instance method, in which case it behaves like self.bjarne. If not, it looks to see whether bjarne is top-level, in which case it calls that method in the Kernel. See for yourself:

def foo
  'top level foo'
end

def bar
  'top level bar'
end

class Test
  def bar
    'instance method bar'
  end
  def test
    p foo
    p bar
  end
end

Test.new.test
##### "top level foo" "instance method bar"

See? It looks for instance methods and then for top-level methods if it can’t find anything. (Again, we are hand-waving over the pesky problem with instance variables in the case where we don’t use ()). What’s the problem? Well, I actually mis-described what happens. Here it is again, with more precision:

It looks for methods defined in the object self, and then for top-level methods if it can’t find anything. Of course, self is the current object. Unless it isn’t: That’s what instance_eval does: it evaluates a block but it changes self to point to its receiver instead of the object where the code is executing. Everything else stays the same. One more example to show the mechanism:

def foo
  'top level foo'
end

def bar
  'top level bar'
end

class Test
  def bar
    'instance method bar'
  end
  def blitz
    'current object blitz'
  end
  def test
    p foo
    p bar
    o = Object.new
    def o.blitz
        'redefined self blitz'
    end
    p o.instance_eval { blitz }
    p o.instance_eval { 'bar within o gives: ' + bar }
  end
end

Test.new.test
##### "top level foo" "instance method bar" "redefined self blitz" "bar within o gives: top level bar"

Now we see: when we use instance_eval, we route around our current object and all of our methods are ignored within the block. Ruby really only has two levels of scope: whatever belongs to self and whatever belongs to Kernel.

This state of affairs is unsatisfactory: we would like to introduce a DSL in such a way that we retain access to all of our methods without kludges (like storing the current object in an instance variable).

Nesting Scopes

The Seasoned Schemer is devoted to the myriad uses of first class functions. This book is approachable and a delight to read, but the ideas are provocative and when you close the back cover you will be able to compose programs from functions in powerful new ways.

You can think of the current scope as being nested inside of the top-level scope. instance_eval doesn’t change the scope for things like local variables, it just points self elsewhere.

What we want is a new scope for our DSL nested inside of the current scope. So when we search for a method, we should check the DSL. If we don’t find it there, check the current object’s scope. If we don’t find it there, check the top-level.

Those who do not learn from the History of Lisp are doomed to repeat it.

Oops. John McCarthy called from 1960. He wants Lisp’s dynamic scoping back. Yes, our new feature is almost fifty years old. This is why either a through grounding in CS theory or a hobbyist’s interest in the history of programming are important for programming: much of what we want to do has already been done before, and sometimes in unexpected contexts. Who would have thought that a technique for helping programmers collaborate with Bond Traders has roots in Lisp 1.5?

Here’s an implementation of a nested scope construct that does exactly what we want. You declare a new class that extends DomainSpecificLanguage, and then you can use methods from your DSL, from your current object, and from the top-level (if you so choose). For example:

require 'dsl'

class MyDSL < DomainSpecificLanguage

  def bjarne
    'Barney'
  end

end

class TheGreat

  def phred
    'Fredrick'
  end

  def test
    plus = ' plus '
    MyDSL.eval { p phred + plus + bjarne }
  end

end

TheGreat.new.test
##### "Fredrick plus Barney"

This does exactly what we want with methods.

There's also a single extension to kernel, the method with. with replaces the eval method so you can also say:


    with MyDSL do
      p phred + plus + bjarne
    end

The eval method creates a new instance of your DSL class, so you can track state within an evaluation. For example:

class Censor < DomainSpecificLanguage
  attr_reader :ok_on_tv

  def initialize (given_binding)
    super(given_binding)
    @ok_on_tv = true
  end

  def say something
    something.split.each do |word|
      @ok_on_tv = false if ['feces', 'urine', 'love', 'pudendum', 'fellator', 'oedipus', 'mammaries'].include?(word)
    end
  end

end

class GeorgeCarlin
  def test
    Censor.eval {
      say "People much wiser than I have said, I'd rather have my son watch a film with two people making love than two people trying to kill one other."
      say "And I of course agree. I wish I know who said it first, and I agree with that."
      ok_on_tv
    }
  end
end

p GeorgeCarlin.new.test
##### "false"

let

The first obvious drawback of this approach is that the blocks we pass to eval cannot take parameters. For this reason, rumour has it that a method called instance_exec will be added to Ruby in 1.9. (There are some implementations available that work in Ruby 1.8 if you would like to experiment.)

The second is that you don’t get anything like nested local variables, a ‘la Pascal, Scheme, or any other language with block structure. Block structure is very powerful: You can use a variable within a particular scope and nowhere else. Here’s a trivial example:

with Let do
  let :x => 0, :y => 1 do
    assert_equal(1, x + y)
    let :x => 2 do
      assert_equal(3, x + y) 
    end
    assert_equal(0, x)
  end 
end

We're using the with syntax. In the Let DSL, there’s a new method called let. let creates a new DSL within Let. You can see that re-declaring x does not clobber the value in the outer scope. That is because when let wrote a new DSL, it added x and y as methods.

So really, that block of code says “Write a new DSL where x and y are methods returning zero and one. Execute some code in that new DSL. That code will create another DSL where x is a method returning two.”

Because let defines methods and not local variables, bad things happen when you try to override real local variables. It’s best to use Let for some things and local variables for others, but not mix the two.

Like what, you ask?

List Comprehensions in Ruby

A List Comprehension is syntactic sugar that lets you build collections using set-like notation. For example, S = [ x | x<-[0..], x^2>3 ] is a list comprehension in Haskell.

Here is a List Comprehensions DSL in Ruby. Let’s say we’re building up a multiplication table. We want tuples of the form [x, y, x * y] given x is in the range 1..12 and y is in the range 1..12. Let’s write that:


require 'comprehension'

class MultiplicationTable
  def twelve_by_twelve
    with Comprehension::DSL do
      list { [x, y, x * y] }.given(:x => 1..12, :y => 1..12)
    end
  end
end
p MultiplicationTable.new.twelve_by_twelve
##### [[1, 1, 1], [1, 2, 2], [2, 1, 2], [1, 3, 3], [2, 2, 4] ...

(In everyday use, you don’t need a class and a method for each comprehension: the important bit is list { [x, y, x * y] }.given(:x => 1..12, :y => 1..12). I just wrote it this way so you could see that comprehensions work fine inside of methods. You can also use more than one comprehension inside of a single with Comprehension::DSL do... end block: see the unit tests for examples.)

The expression in the block doesn’t have to be a tuple:


class MultiplicationTable
  def twelve_by_twelve
    with Comprehension::DSL do
      list { "#{x} times #{y} is #{x * y}" }.given(:x => 1..12, :y => 1..12)
    end
  end
end
p MultiplicationTable.new.twelve_by_twelve
##### ["1 times 1 is 1", "1 times 2 is 2", "2 times 1 is 2", "1 times 3 is 3", "2 times 2 is 4", ...

And you can stick a “where” block on the end:


class MultiplicationTable
  def twelve_by_twelve_odds
    with Comprehension::DSL do
      list { "#{x} times #{y} is #{x * y}" }.given(:x => 1..12, :y => 1..12) { (x % 2 == 1) && (y % 2 == 1) }
    end
  end
end
p MultiplicationTable.new.twelve_by_twelve_odds
##### ... 3 times 5 is 15", "5 times 3 is 15", "7 times 1 is 7", "1 times 9 is 9", ...

Would you like to nest them? Your expression is the interpreter’s command:


class MultiplicationTable
  def odds_times_evens
    with Comprehension::DSL do
      list { "#{x} times #{y} is #{x * y}" }.given(
          :x => list { x }.given(:x => 1..12) { x % 2 == 0 } , 
          :y => list { x }.given(:x => 1..12) { x % 2 == 1 } )
    end
  end
end
p MultiplicationTable.new.odds_times_evens
##### ... "2 times 11 is 22", "4 times 9 is 36", "6 times 7 is 42", ...

List Comprehensions and Let

What is the relationship to Let? Well, Let builds the scopes needed for evaluating the where clause and the block defining the elements of the list. Yes, we’ve built a DSL on top of a DSL on top of a DSL. Does this seem like weird trickery? I don’t know why. Do you have any idea how many levels of abstraction are responsible for you reading this essay right now?

This is what we humans do: we build tools on top of tools. Your browser runs on an OS, possibly in a VM, perhaps in a hypervisor, on top of a BIOS, and on and on. This is the normal state of affairs, not an exception.

Closing Remarks

It is possible to build DSLs in Ruby to facilitate cross-functional teamwork and separation of concerns. Care must be taken to avoid polluting the top-level name space, but it is possible to work within sandboxes and still have access to the current object’s context.

Oh yes, and programming is fun as always

Source Code

Update: The copy of dsl.rb has been updated to the latest version. I had committed a rather typical manual synchronization error: I copied the latest file to the wrong directory when I first posted this. Thanks, Justin!

DomainSpecificLanguage and Let

Comprehension

How to try it for yourself: Open DomainSpecificLanguage and Let. Save the text only (not the HTML) as dsl.rb. Open Comprehension. Save the text only as anything you like, as long as it is in the same directory as dsl.rb: I use comprehension.rb. Run comprehension.rb.

I generally call “Bullshit!” on any line of reasoning that sets up a straw man argument just to knock it down. So read on with skepticism!

Alan Kay has said that he regrets popularizing the notion of “Object-Oriented” programming, and that he should have called it “Message-Oriented” programming.

Labels: lispy, popular, ruby

¶ 12:19 PM

Comments on “An Approach to Composing Domain-Specific Languages in Ruby”:

There's one important trick to know if you're building block-based DSLs in Ruby: You should almost always use "environment" objects:

x = "blah"
my_dsl do |e|
e.use(x)
end

Here, the block 'do ... end' is just an ordinary closure, run in the caller's environment. It's easy to access 'x' from within the block.

The DSL-related names, such as 'use', are accessed through 'e'. This allows you to keep the two different scopes straight.

This technique is used in Rake, XML::Builder, Rails routing, and many other DSLs. There are alternatives, but most of them are subtly broken in ways that you'll discover _after_ writing tons of code. :-(

# posted by

emk : 2:49 PM

404 for http://raganwald.com/source/dsls_and_let.html

# posted by

Wilkes Joiner : 9:25 AM

looks like a small typo--should be
http://raganwald.com/source/dsl_and_let.html

# posted by

justin : 2:25 PM

This is quite a post, Reg. Despite the amount of DSL buzz in the Ruby community of late, the signal to noise ratio has been rather low. Instead of so much hand-waving, it's very refreshing to read a concrete discussion about some of the pitfalls everyone encounters along with some useful code to help alleviate them.

Two minor corrections:
The comprehension code doesn't run because of the Let.eval calls in given. I'm not sure whether this was intended as
Let.eval do
  let assignments, &mapping
end
or something else.

Anyhow, you can either use the eval syntax, or the equivalent with syntax, i.e.
with Let do
  let assignments, &mapping
end
and things will be fine.

Finally, the &@where in the first Let block inside given should instead be &where so that the accessor is used. Otherwise, @where will be nil in the context of the Let instance the block is evaluated within.

Making these changes, the code looks like this:

def given assignments = {}, &block
  @where = block if block_given?
  assignments.each { |term, value| @terms[term] = value }
  names = terms.keys.sort { |a, b| a.to_s <=> b.to_s }
  values = names.map { |term| terms[term].map { |value| { term => value } } }
  product_of_assignments = self.class.cartesian_product(*values).map do |list_of_assignments|
    list_of_assignments.inject { |acc, ass| acc.merge(ass) }
  end
  if @where
    product_of_assignments = product_of_assignments.select do |assignments|
      with Let do
        let assignments, &where
      end
    end
  end
  product_of_assignments.map do |assignments|
    with Let do
      let assignments, &mapping
    end
  end
end

Thanks again for the post. It's arrival is particularly timely as we've a few new projects at work which are prime candidates for some DSL love.

# posted by

Justin : 6:05 PM

I am lost. In the first example using the "DomainSpecificLanguage" object, what is the "p" in line

MyDSL.eval { p phred + plus + bjarne }

Thanks

# posted by

Kyle Lahnakoski : 11:55 AM

Kyle, p is just a display method in the standard library. ri gives:
"For each object, directly writes _obj_.+inspect+ followed by the current output record separator to the program's standard output."

# posted by

Justin : 7:24 PM

Justin:

Very interesting. I pasted my code directly from TextMate after running each example.

When I get back next week, I would like to investigate why we are getting different results.

# posted by

Reginald Braithwaite : 9:17 AM

The link has been fixed, thanks!

# posted by

Reginald Braithwaite : 9:20 AM

Reg@9:17AM:

Sounds good--I'll watch this space.

# posted by

Justin : 5:51 PM

Trying real hard to follow this. But running the two major source files, dsl.rb and comprehension.rb I get the following errors persistently... Has there been a Ruby version change or am I just losing it?

------------------

1) Error:
test_assignment(TestLet):
ArgumentError: wrong number of arguments (0 for 1)
method y in dsl.rb at line 340
method test_assignment in dsl.rb at line 340
method let in dsl.rb at line 142
method method_missing in dsl.rb at line 72
method test_assignment in dsl.rb at line 339
method let in dsl.rb at line 142
method test_assignment in dsl.rb at line 337
method eval in dsl.rb at line 81
method with in dsl.rb at line 95
method test_assignment in dsl.rb at line 336

2) Error:
test_nesting(TestLet):
ArgumentError: wrong number of arguments (0 for 1)
method y in dsl.rb at line 328
method test_nesting in dsl.rb at line 328
method let in dsl.rb at line 142
method method_missing in dsl.rb at line 72
method test_nesting in dsl.rb at line 327
method let in dsl.rb at line 142
method test_nesting in dsl.rb at line 325
method eval in dsl.rb at line 81
method with in dsl.rb at line 95
method test_nesting in dsl.rb at line 324

# posted by

Pito : 1:36 PM

<< Home