Ovid has asked for the wisdom of the Perl Monks concerning the following question:

About 10 months ago, as of this writing, a bug for Regexp::Common was entered regarding a lone decimal point matching the real number regex. Here's an example:

perl -MRegexp::Common -le 'print "." =~ $RE{num}{real}'

And that dutifully prints "1" because of the bug. I need to figure out how to get that regular expression to fail. But there's a problem. I need to pass the regular expression to another piece of code and it's important that I need to have that regex fail within the regular expression. In other words, I can't do this:

if ('.' ne $_ && /$RE{num}{real}/) { ... }

Since this bug was reported 10 months ago, I'm not sanguine about it being fixed any time soon and I need to solve this problem now. Either I can decide not to use the Regexp::Common regex for this one match or I can force the regex to fail. The former option is the one I'm going with, but I was curious as to why my code to force the regex to fail didn't work (note that what follows has some extra stuff, but it mirrors exactly what I need).

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Regexp::Common; use constant SUCCEED => qr{(?=)}; use constant FAIL => qr{(?!)}; my $QUOTED = $RE{quoted}; my $NUM = $RE{num}{real}; my $VALUE = do { use re 'eval'; qr/(?:$QUOTED|$NUM)(??{'.' eq $+ ? FAIL : SUCCEED})/; }; my $text = 'name => "foo", fav.num => 3'; my @text = split /($VALUE)/ => $text; print Dumper \@text;

That prints:

$VAR1 = [ 'name => ', '"foo"', ', fav', '.', 'num => ', '3' ];

What I want it to print is:

$VAR1 = [ 'name => ', '"foo"', ', fav.num => ', '3' ];

Because of the nature of the code (using the lexer from Higher Order Perl's chapter 8, if you're curious), I can't change how the split line operates. I pass regexes to the split and I need the regex to fail at that time. Can anyone help fix the first regex or tell me why my attempt at failing it is broken? (I've also tried the $^N variable, but no love.) I'd prefer to fix my code as I'd like to rely on the Regexp::Common module, but I also know that I'm using experimental features.

Or maybe there's something really simple that I haven't seen.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: Forcing a regex to fail
by tlm (Prior) on May 06, 2005 at 01:09 UTC

    I got it to work by replacing the regexp with

    qr/(?:$QUOTED|$NUM)(??{'' eq $& || '.' eq $& ? FAIL : SUCCEED})/ __END__ $VAR1 = [ 'name => ', '"foo"', ', fav.num => ', '3' ];
    I found that if I only changed $+ to $&, I did not get the correct output:
    $VAR1 = [ 'name => ', '"foo"', ', fav', '', '.num => ', '3' ];

    Update 2: Fixed bug pointed out by Roy Johnson. Thanks.

    the lowliest monk

      The !$& shortcut will fail to match zero. (I also note that the SUCCEED expression is redundant with just returning nothing.)
      my @regexen = do { use re 'eval'; qr/(?:$QUOTED|$NUM)(??{FAIL if $& eq '.' or $& eq ''})/, qr/(?:$QUOTED|$NUM)(??{!$& || '.' eq $& ? FAIL : SUCCEED})/ }; my $teststr = 'xx0y-0z0.a..0b-0.c-.0d.d.0.0e-0.0'; print Dumper [ $teststr =~ /$_/g ] for @regexen; __END__ $VAR1 = [ '0', '-0', '0.', '.0', '-0.', '-.0', '.0', '.0e-0', '.0' ]; $VAR1 = [ '-0', '0.', '.0', '-0.', '-.0', '.0', '.0e-0', '.0' ];

      Caution: Contents may have been coded under pressure.
Re: Forcing a regex to fail
by davidrw (Prior) on May 06, 2005 at 01:04 UTC
    It works for me if you change $+ (last bracket) to $& (entire matched string). I assume $+ doesn't work because of the ?: clustering? My (simplified--on windows at the moment w/o Regex::Common) code/output is below.
Re: Forcing a regex to fail (look, a head!)
by tye (Sage) on May 06, 2005 at 02:22 UTC

    I'm not sure I understood the whole node, but wouldn't it be easier to skip experimental regex features and just prepend (?![.]$) or (?![.](?!\d)) to (that part of) the regex ?

    - tye        

      I don't follow. To what part of the regex?

      Cheers,
      Ovid

      New address of my CGI Course.

        The part that must not match ".".

        - tye        

Re: Forcing a regex to fail
by Roy Johnson (Monsignor) on May 06, 2005 at 03:35 UTC
    Without experimental features, even.
    my $VALUE = qr/(?!\.(?![0-9]))(?:$QUOTED|$NUM)/;
    Update: Oh, I see this is tye's solution, as well.

    Caution: Contents may have been coded under pressure.
Re: Forcing a regex to fail
by Roy Johnson (Monsignor) on May 06, 2005 at 03:32 UTC
    $+ matches what capturing parentheses matched. You have non-capturing parens. I presume, because it's part of an expression you're building, that you want to avoid capturing. But that's why it's not working.

    Caution: Contents may have been coded under pressure.
Re: Forcing a regex to fail
by Aristotle (Chancellor) on May 06, 2005 at 09:25 UTC

    The first problem is that you’re using (?:) to contain the alternation, and then using $+ to check the captured value. Which captured value? There’s no captured value. You need to move the capturing parens from the split to the the qr//.

    Next, do you really need a deferred pattern there? I’d write it like so:

    qr/($QUOTED|$NUM)(?(?{ '.' eq $+ })$FAIL)/;

    (which of course implies variables rather than constants.)

    Now given that, you get a zero-length match:

    $VAR1 = [
              'name => ',
              '"foo"',
              ', fav',
              '',
              '.num => ',
              '3'
            ]; 
    

    So obviously a match against the lone dot was prevented, but $NUM succeeds in matching nothing. Easily fixed:

    qr/($QUOTED|$NUM)(?(?{ '.' eq $+ or not length( $+ ) })$FAIL)/;

    Result:

    $VAR1 = [
              'name => ',
              '"foo"',
              ', fav.num => ',
              '3'
            ]; 
    

    Q.E.D.

    But for practical use I’d prefer tye’s approach, which forces failure without squandering effort.

    Makeshifts last the longest.

      You need to move the capturing parens from the split to the the qr//.

      I like your solution, but as noted in the original query, I can't change the split line. tye's solution is the way I've gone. It works quite nicely.

      Cheers,
      Ovid

      New address of my CGI Course.