gjb has asked for the wisdom of the Perl Monks concerning the following question:

I stumbled acros the following problem while writing code to characterize numbers. I want to capture those numbers that consists of runs of consecutive digits such as 123 or 4567 (any length > 1 and < 10). Of course there are many ways to do this, but I tried the following and it works nicely:

#!perl use strict; use warnings; while (<DATA>) { chomp($_); print "$_ okay\n" if /^(\d)((??{$+ + 1}))+$/; } __DATA__ 234 213 12345
with as output:
234 okay 12345 okay
as expected.

However, if I include use warnings; as I usually do, I get the following warning:
((??{$+ + 1}))+ matches null string many times before HERE mark in regex m/^(\d)((??{$+ + 1}))+ << HERE $/ at ./test.pl line 8.

Could anyone enlighten me as to (1) the source of the warning, and (2) how to reformulate the regexp to get rid of it?

I'm aware that the (??{...}) construct is experimental and won't use it in production code, but it triggers my curiosity.

Thanks in advance, -gjb-

Replies are listed 'Best First'.
Re: Code in regexp
by Abigail-II (Bishop) on Nov 28, 2002 at 17:43 UTC
    You might want to read my bug report, and the discussion it triggered at http://bugs6.perl.org/rt2/Ticket/Display.html?id=10040. However, I think Perl is mistaken to think that (??{ }) generates a zero-width assertion.

    Here's another way of doing it:

    /^(\d)(?{local ($x, $c) = ($^N, 0)}) ((\d)(?(?{$c ++; $^N == $x + $c})|\A))+$/x

    Abigail

      Is the local() needed abigail? I was playing with this yesterday and didnt seem to need it.

      --- demerphq
      my friends call me, usually because I'm late....

        No, it's not needed. Just like use strict and my aren't "needed". But if your code happens to have a $main::x, you'll be thankful you used local.

        Abigail

      Does perl think that (??{ }) generates a zero-width assertion?

      If that is the case, why should the + qualifier behave any differently to the {1,8} qualifier in pg's regex below?

      It's surely a bug that one complains about matching empty string repeatedly, while the other seems to behave as non-zero width.

      Jasper
        Further to this (not that this thread isn't long dead), in Terje Kristensen's latest minigolf competition he came up with this:
        -n $;[map{/./,/[^$&-T]/g}/(?=(.*))/g].=$_}{print@
        Which does funny non zero width things for (?=)
        perl -le 'print for "foo bar"=~/(?=(.*))/g' foo bar oo bar o bar bar bar ar r
        I had no idea this would happen, anyway.

        Jasper
Re: Code in regexp
by broquaint (Abbot) on Nov 28, 2002 at 17:25 UTC
    (1) the source of the warning
    According to perldiag
    %s matches null string many times (W regexp) The pattern you've specified would be an infinite loop if the regular expression engine didn't specifically check for that. See the perlre manpage.

    HTH

    _________
    broquaint

Re: Code in regexp
by pg (Canon) on Nov 28, 2002 at 18:14 UTC
    This is not a Perl bug, simply change your regexp to this, and the warning will go away: (As you said the length is between 2 and 9 inclusively)
    print "$_ okay\n" if /^(\d)((??{$+ + 1})){1,8}$/;
Re: Code in regexp
by pg (Canon) on Nov 29, 2002 at 05:54 UTC
    Let's go beyond the nine dots. If we can match your input with a pattern, why cannot we match a pattern with your input? The following code does this, and the regexp is much more simple and easier to understand:
    $pattern = "123456789"; $str_under_testing = "345678"; print "okay" if ($pattern =~ /^\d*?$str_under_testing\d*$/);