(

Tuesday, February 26, 2008

So, you think you know Regex-fu?

As I’ve mentioned:*non-prime* number of ones (which means it can be used to recognize a prime number of ones, obviously). It’s obfuscated and golf-like at the same time. But here’s a challenge that takes Regular Expressions to the next level (possible the next lower level of Hell, but these are the risks we adventurers must face):

Yes, it would be sweet indeed. Anyone up for the challenge? How hot is your Regex-fu?

As I’ve mentioned:

/^1?$|^(11+?)\1+$/

Is a regular expression that matches a If you can provide a regular expression such that it actually matches the string representation of only prime (or only non-prime) integers, that would be pretty sweet. A proof that such a thing could not be created would be equally impressive.

Yes, it would be sweet indeed. Anyone up for the challenge? How hot is your Regex-fu?

this page claims, and I'm inclined to agree but not excited enough to remember how to use the pumping lemma, that the language of "strings of 1s of prime length" is not a regular language.

This is impossible because of the pumping lemma, I believe: http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages

The pumping lemma says the language is not regular. Regular expressions these days can match any context -free language through the use of sub-expressions (basically recursion). See this post for an example of a regexp that matches balanced braces.

So even though the language of all prime numbers might not be regular, it might still be possible to build a regexp that matches it (assuming it's context free).

So even though the language of all prime numbers might not be regular, it might still be possible to build a regexp that matches it (assuming it's context free).

Since several people have already beat me to the obvious (that regular expressions can't do that), I'll be half-way constructive and ask, then, what system is this "Regex-fu" challenge concerned with? Ruby's "regexes"? POSIX's? Any arbitrary esoteric parser at all?

If it's the last one, transforming a Parsec-based Haskell parser for primes into a pointfree form a la Lambdabot's @pl would be contested only by Perl.

If it's the last one, transforming a Parsec-based Haskell parser for primes into a pointfree form a la Lambdabot's @pl would be contested only by Perl.

Um, looking at that regular expression, it doesn't match strings of ones of prime length, it matches strings of ones of non prime length. Breaking it down, it matches:

^1$ - 1 one. 1 is not prime

^(11+)\1+$ - n * m strings of ones. Not prime either.

It's not all that obfuscated as regexes go, just surprising.

^1$ - 1 one. 1 is not prime

^(11+)\1+$ - n * m strings of ones. Not prime either.

It's not all that obfuscated as regexes go, just surprising.

@bobwhoops: The pumping lemma applies to basic regular expressions and regular languages. Using extended features like backreferences, you can accept languages that go beyond regular languages. For example, you can match strings of properly matched nested parenthesis, which is not a regular language.

It's been proven than Perl pattern matching is NP-hard - (http://perl.plover.com/NPC/) . The AKS primality test is in P, which is at least a subset of NP.

So yes, it is possible to provide a regex to match only primes, provided the correct reduction.

Ps. Technically patterns with backreferences are not "regular" expressions - Perl is careful to call them "regex"es.

So yes, it is possible to provide a regex to match only primes, provided the correct reduction.

Ps. Technically patterns with backreferences are not "regular" expressions - Perl is careful to call them "regex"es.

To be clear, we're being asked to find (or prove the non-existence of) a regex that would match things like "7" and "13" right?

Sammy:

Absolutely! If you can write a RegEx that also recognizes that**2**32582657-1** is a prime number, please come to the front of the class to accept your prize.

Absolutely! If you can write a RegEx that also recognizes that

What a strange request. Might as well have asked for a regular expression that can match blue objects.

How about a two liner for Linux:

#!/bin/sh

factor $1 | awk '/^[0-9]*: [0-9]*$/ { print "true"; next } /.*/ { print "false" }'

It contains regular expressions (and a few other things) that match prime numbers (coming out of factor :-). What's wrong with a minor amount of cheating to get around some irritating theory issues...

Next Week: The Halting Problem.

Paul.

#!/bin/sh

factor $1 | awk '/^[0-9]*: [0-9]*$/ { print "true"; next } /.*/ { print "false" }'

It contains regular expressions (and a few other things) that match prime numbers (coming out of factor :-). What's wrong with a minor amount of cheating to get around some irritating theory issues...

Next Week: The Halting Problem.

Paul.

Using Oniguruma (Ruby 1.9's regex engine)'s recursion abilities, here's a regular expression that matches all fizzbuzz-able non-negative integers in a given string: http://www.pastie.org/158799

Now, do you know if the fizzbuzz gem accepts Ruby 1.9 entries? :)

Now, do you know if the fizzbuzz gem accepts Ruby 1.9 entries? :)

It does! I need to get the project set up so people can contribute more freely. If you will e-mail your name to me (David Brady) I'll add it to the credits. (My e-mail address is in my profile at rubyforge.)

As Sean mentioned, this is a silly request. You could create a "regex" engine capable of some type of math, then use that to solve this problem, but current, popular regular expression flavors cannot do it. However, you might be able to cheat using embedded Perl code or PCRE callouts.

But Steve, the interesting part is, the question "can you prove it can't be done?"

Just because we aren't able to come up with an expression to do it doesn't mean that one doesn't exist.

Just because we aren't able to come up with an expression to do it doesn't mean that one doesn't exist.

As lots of people have noted, the request is a malformed specification. It is clear that a regular expression cannot match it, so then the request is about a regexp, but what features a regexp supports is not well defined either.

<< Home

Recent Writing

Homoiconic Technical Writing / raganwald.posterous.com
Books

What I‘ve Learned From Failure / Kestrels, Quirky Birds, and Hopeless Egocentricity
Share

rewrite_rails /
andand /
unfold.rb /
string_to_proc.rb /
dsl_and_let.rb /
comprehension.rb /
lazy_lists.rb
Beauty

IS-STRICTLY-EQUIVALENT-TO-A /
Spaghetti-Western Coding /
Golf is a good program spoiled /
Programming conventions as signals /
Not all functions should be object methods
The Not So Big Software Design / Writing programs for people to read / Why Why Functional Programming Matters Matters / But Y would I want to do a thing like this?

Work

The single most important thing you must do to improve your programming career /
The Naïve Approach to Hiring People /
No Disrespect /
Take control of your interview /
Three tips for getting a job through a recruiter /
My favourite interview question
Management

Exception Handling in Software Development /
What if powerful languages and idioms only work for small teams? /
Bricks /
Which theory fits the evidence? /
Still failing, still learning /
What I’ve learned from failure
Notation

The unary ampersand in Ruby /
(1..100).inject(&:+) /
The challenge of teaching yourself a programming language /
The significance of the meta-circular interpreter /
Block-Structured Javascript /
Haskell, Ruby and Infinity /
Closures and Higher-Order Functions
Opinion

Why Apple is more expensive than Amazon /
Why we are the biggest obstacles to our own growth /
Is software the documentation of business process mistakes? /
We have lost control of the apparatus /
What I’ve Learned From Sales
I,
II,
III
Whimsey

The Narcissism of Small Code Differences /
Billy Martin’s Technique for Managing his Manager /
Three stories about The Tao /
Programming Language Stories /
Why You Need a Degree to Work For BigCo
History

06/04 /
07/04 /
08/04 /
09/04 /
10/04 /
11/04 /
12/04 /
01/05 /
02/05 /
03/05 /
04/05 /
06/05 /
07/05 /
08/05 /
09/05 /
10/05 /
11/05 /
01/06 /
02/06 /
03/06 /
04/06 /
05/06 /
06/06 /
07/06 /
08/06 /
09/06 /
10/06 /
11/06 /
12/06 /
01/07 /
02/07 /
03/07 /
04/07 /
05/07 /
06/07 /
07/07 /
08/07 /
09/07 /
10/07 /
11/07 /
12/07 /
01/08 /
02/08 /
03/08 /
04/08 /
05/08 /
06/08 /
07/08 /