Rabscuttle has asked for the wisdom of the Perl Monks concerning the following question:

Hello!

Something is going wrong in my regular expression. It works properly when I include the (?{print;}) diagnostic parts, but when I remove them that part of the regex always returns true.

It's meant to be part of a scrabble program. Given a space on the board (in this example <space><space>W) and a set of tiles (ORW), it matches words that can be played in that spot.

for( qw(row bow sow how wow crw stw wrw)) { print "$_\n"; if(/ (?(?=(.*o){1}) (?{print "there's an o!\n";}) (?(?!(.*r){1}) (?{print "there's no r!\n";}) (?=(.*w){2}) (?{print "there's two ws!\n";}) | (?{print "there's an r!\n";}) ) | (?{print "there's no o!\n";}) (?=(.*w){2}) (?{print "there's 2 ws!\n";}) (?=(.*r){1}) (?{print "there's an r!\n";}) ) ^..W$ /xi) { print "YES\n\n"; } else { print "NO\n\n"; } }

I'm pretty sure this was working some time ago when I had an earlier version of perl, but I've since updated to 5.10.0 and now it's not.

Thanks for any help

Replies are listed 'Best First'.
Re: Oddity with Conditional Regex
by JavaFan (Canon) on Nov 12, 2008 at 13:34 UTC
    If all the (?{ }) parts are removed, one of the clauses is empty. And since an empty string always matches, your answer is always "YES".

      I suppose the idea was to remove the empty else clause together with the debugging prints...  Anyhow, when doing so, i.e. with this test

      if(/ (?(?=(.*o){1}) (?(?!(.*r){1}) (?=(.*w){2}) ) | (?=(.*w){2}) (?=(.*r){1}) ) ^..W$ /xi)

      I can confirm that 5.8.8 and 5.10.0 are in fact producing different results: 5.8.8 gives YES/NO/NO/NO/YES/NO/NO/YES, while with 5.10.0 everything matches.

        It sure seems like a bug to me.

        The regexp engine was massively changed between 5.8 and 5.10. Some bugs fell through. Some were found and fixed already. I don't know about this one. 5.10.1 is expected to be released before the end of the year.

        On the plus side, both 5.8 and 5.10 compiled the regexp identically according to use re 'debug';. It's the matching that differs. In both version, it finds that (.*w) only matched once, backtracks. Then it seems to forget to forget that the entire (?(...)...) failed in 5.10.

        As a workaround, replacing
        (?=(.*w){2})
        with
        (?(?=(.*w){2})|(?!))
        results in the desired behaviour.

        By the way, I'd place the ^ before the (?()), and I'd use grouping parens ((?:...)) instead of capturing parens ((...)).

Re: Oddity with Conditional Regex
by JadeNB (Chaplain) on Nov 12, 2008 at 19:55 UTC
    I don't know why you should be experiencing the regex oddity, but it does seem to me that this is likely to be extremely slow on long strings, because of the huge amount of backtracking that could be required. Would
    @count{ qw/o r w/ } = ( tr/oO/oO/, tr/rR/rR/, tr/wW/wW/ );
    (preceded by my %count, of course, and followed by appropriate testing) do what you need?