in reply to regexp golf - homework

Interesting problem, and ++ for admiting it was homework!

I'm a little stumped by this one myself. If the problem were inverted ("Write a regexp that matches strings containing matching digits"), it would be a lot easier:

/(0.*0)|(1.*1)|(2.*2)|(3.*3) ...  (you get the idea)

I'm curious, do regular expressions have an equivalent of Demorgan's law in boolean expressions that allows you to invert an expression?

Best of luck to you, please let us know what you find out.

Replies are listed 'Best First'.
Re: Re: regexp golf - homework
by no_slogan (Deacon) on Feb 02, 2002 at 22:37 UTC
    Yes, it is always possible to invert a (mathematical) regular expression. I don't know of a simple method for doing it, though. You have to convert the regex to an NFA (nondeterministic finite automaton), remove the nondeterminism to produce a DFA, switch the accepting and nonaccepting states, and then convert the DFA back to a regex. It's nasty, and it's not guaranteed to give you the "best" possible regex. Also, the DFA for this problem has 2**N+1 states, where N is the number of digits.

    In the final step, converting the DFA to a regex, you have a choice as to which states you're going to prune off first. Boots111's first solution corresponds to pruning from left to right, while the second solution corresponds to working from the outside in. Extending the second solution to 10 digits would result in 252 branches in the top-level parenthesis group, and each of those branches would contain the solutions to two different 5-digit problems. Ick.

    particle has a good idea of using backreferences. Those don't exist in "mathematical" regular expressions. The person who posed the problem might think of them as extended operators, in which case they wouldn't be allowed here. Not for me to say.