So, you think you know Regex-fu?

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Tuesday, February 26, 2008

As I’ve mentioned:


  /^1?$|^(11+?)\1+$/

Is a regular expression that matches a non-prime number of ones (which means it can be used to recognize a prime number of ones, obviously). It’s obfuscated and golf-like at the same time. But here’s a challenge that takes Regular Expressions to the next level (possible the next lower level of Hell, but these are the risks we adventurers must face):

If you can provide a regular expression such that it actually matches the string representation of only prime (or only non-prime) integers, that would be pretty sweet. A proof that such a thing could not be created would be equally impressive.

—Sam, Overthinking and Stupid Programmer Tricks

Yes, it would be sweet indeed. Anyone up for the challenge? How hot is your Regex-fu?

¶ 11:23 PM

Comments on “So, you think you know Regex-fu?”:

this page claims, and I'm inclined to agree but not excited enough to remember how to use the pumping lemma, that the language of "strings of 1s of prime length" is not a regular language.

# posted by

Bill Mill : 12:01 AM

This is impossible because of the pumping lemma, I believe: http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages

# posted by

bobwhoops : 12:02 AM

s/language of "strings of 1s of prime length"/language recognized by this regular expression/

# posted by

Bill Mill : 12:03 AM

The pumping lemma says the language is not regular. Regular expressions these days can match any context -free language through the use of sub-expressions (basically recursion). See this post for an example of a regexp that matches balanced braces.
So even though the language of all prime numbers might not be regular, it might still be possible to build a regexp that matches it (assuming it's context free).

# posted by

Oscar Bonilla : 12:44 AM

Since several people have already beat me to the obvious (that regular expressions can't do that), I'll be half-way constructive and ask, then, what system is this "Regex-fu" challenge concerned with? Ruby's "regexes"? POSIX's? Any arbitrary esoteric parser at all?

If it's the last one, transforming a Parsec-based Haskell parser for primes into a pointfree form a la Lambdabot's @pl would be contested only by Perl.

# posted by

Tac-Tics : 12:59 AM

Whoops, you're right.

# posted by

bobwhoops : 1:04 AM

Um, looking at that regular expression, it doesn't match strings of ones of prime length, it matches strings of ones of non prime length. Breaking it down, it matches:

^1$ - 1 one. 1 is not prime
^(11+)\1+$ - n * m strings of ones. Not prime either.

It's not all that obfuscated as regexes go, just surprising.

# posted by

bofh.org.uk : 1:08 AM

You say that the regex matches a prime number of 1s. It actually matches a non-prime number of 1s.

# posted by

Joel Hockey : 1:10 AM

@bobwhoops: The pumping lemma applies to basic regular expressions and regular languages. Using extended features like backreferences, you can accept languages that go beyond regular languages. For example, you can match strings of properly matched nested parenthesis, which is not a regular language.

# posted by

Brad : 7:57 AM

It's been proven than Perl pattern matching is NP-hard - (http://perl.plover.com/NPC/) . The AKS primality test is in P, which is at least a subset of NP.

So yes, it is possible to provide a regex to match only primes, provided the correct reduction.

Ps. Technically patterns with backreferences are not "regular" expressions - Perl is careful to call them "regex"es.

# posted by

Chris : 8:08 AM

To be clear, we're being asked to find (or prove the non-existence of) a regex that would match things like "7" and "13" right?

# posted by

Sammy Larbi : 10:50 AM

Sammy:

Absolutely! If you can write a RegEx that also recognizes that 2**32582657-1 is a prime number, please come to the front of the class to accept your prize.

# posted by

Reginald Braithwaite : 10:53 AM

What a strange request. Might as well have asked for a regular expression that can match blue objects.

# posted by

Sean : 2:48 PM

How about a two liner for Linux:

#!/bin/sh
factor $1 | awk '/^[0-9]*: [0-9]*$/ { print "true"; next } /.*/ { print "false" }'

It contains regular expressions (and a few other things) that match prime numbers (coming out of factor :-). What's wrong with a minor amount of cheating to get around some irritating theory issues...

Next Week: The Halting Problem.

Paul.

# posted by

Paul W. Homer : 5:11 PM

Trivial:

/^1?$|^(11+?)\1+$/

Oh. You want it in a base other than 1?

# posted by

The Quux : 2:11 AM

Using Oniguruma (Ruby 1.9's regex engine)'s recursion abilities, here's a regular expression that matches all fizzbuzz-able non-negative integers in a given string: http://www.pastie.org/158799

Now, do you know if the fizzbuzz gem accepts Ruby 1.9 entries? :)

# posted by

mernen : 11:30 AM

Now, do you know if the fizzbuzz gem accepts Ruby 1.9 entries? :)

It does! I need to get the project set up so people can contribute more freely. If you will e-mail your name to me (David Brady) I'll add it to the credits. (My e-mail address is in my profile at rubyforge.)

# posted by

Chalain : 10:53 PM

As Sean mentioned, this is a silly request. You could create a "regex" engine capable of some type of math, then use that to solve this problem, but current, popular regular expression flavors cannot do it. However, you might be able to cheat using embedded Perl code or PCRE callouts.

# posted by

Steve : 11:25 PM

But Steve, the interesting part is, the question "can you prove it can't be done?"

Just because we aren't able to come up with an expression to do it doesn't mean that one doesn't exist.

# posted by

Sammy Larbi : 7:06 AM

As lots of people have noted, the request is a malformed specification. It is clear that a regular expression cannot match it, so then the request is about a regexp, but what features a regexp supports is not well defined either.

# posted by

Ola Bini : 8:35 AM

<< Home