haukex has asked for the wisdom of the Perl Monks concerning the following question:

I am working on some code code that boils down to:

'g'=~/g/; my $regex = 'm{}g'; # user input my $text = 'x'; # user input eval qq{ use re 'debug'; print "<\$&>\\n" while \$text =~ $regex; };

Which only produces this output:

Compiling REx "" Final program: 1: NOTHING (2) 2: END (0) minlen 0 Freeing REx: ""

As per perlop:

The empty pattern //

If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. In this case, only the g and c flags on the empty pattern are honored; the other flags are taken from the original pattern. If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match).

So in other words, it's silently (!) matching $text against /g/. Is there some way to reset the state of the regex engine so that the empty pattern always acts as a genuine empty pattern? it acts as if no other regex has been executed before $regex? I think that'd make more sense to the users instead of the current confusing behavior. (I think I'll display a message to the user about the empty pattern in any case.)

Update: Updated explanation in final paragraph.

Replies are listed 'Best First'.
Re: Reset meaning of empty pattern?
by tybalt89 (Monsignor) on Sep 01, 2018 at 12:18 UTC

      Nice idea, thanks! I'm using /(?:)/ for now. It's still a bit of a workaround though, since it affects the debug output - you see a lot of "Matching REx "(?:)"" instead of "Matching REx """:

      use warnings; use strict; 'g'=~/g/; my $regex = 'm{}g'; # user input my $text = 'x'; # user input eval qq{ use re 'debug'; ''=~/(?:)/; print "<\$&>\\n" while \$text =~ $regex; }; __END__ Compiling REx "(?:)" Final program: 1: NOTHING (2) 2: END (0) minlen 0 Compiling REx "" Final program: 1: NOTHING (2) 2: END (0) minlen 0 Matching REx "(?:)" against "" 0 <> <> | 0| 1:NOTHING(2) 0 <> <> | 0| 2:END(0) Match successful! Matching REx "(?:)" against "x" 0 <> <x> | 0| 1:NOTHING(2) 0 <> <x> | 0| 2:END(0) Match successful! <> Matching REx "(?:)" against "x" 0 <> <x> | 0| 1:NOTHING(2) 0 <> <x> | 0| 2:END(0) Match possible, but length=0 is smaller than requested=1, failing! 1 <x> <> | 0| 1:NOTHING(2) 1 <x> <> | 0| 2:END(0) Match successful! <> Matching REx "(?:)" against "" 1 <x> <> | 0| 1:NOTHING(2) 1 <x> <> | 0| 2:END(0) Match possible, but length=0 is smaller than requested=1, failing! Match failed Freeing REx: "" Freeing REx: "(?:)"
      Or /(?#)/
Re: Reset meaning of empty pattern?
by LanX (Saint) on Sep 01, 2018 at 12:31 UTC
    This is one of the magic edge cases which confuse me in Perl. :(

    They are nice for short scripts but a source of errors in larger apps, because they break orthogonality.

    Anyway I'm confused about the result you are expecting... the empty pattern would always match for every position

    use strict; use warnings; use feature 'say'; my $run=0; sub test { $run++; my $regex = 'm{}g'; # user input my $text = 'xx'; # user input print "--- run $run\n"; pos($text)=0; eval <<"__CODE__"; say pos(\$text),"<\$&>" while \$text =~ $regex; __CODE__ } test(); say "successful /g/" if 'g' =~ /g/; test(); say "successful /x/" if 'x' =~ /x/; test(); my $empty=''; say "successful //" if ' ' =~ /$empty/; test();
    --- run 1 0<> 1<> 2<> successful /g/ --- run 2 successful /x/ --- run 3 1<x> 2<x> --- run 4 1<x> 2<x>

    so you are looking for a way to reproduce run 1?

    I wasn't able to reset the regex engine here

    update

    tybalt's suggestions seem to work well:

    ... for my $tybalt (qw/ () | (?:) g{0}/) { 'g' =~ /g/; say "successful /$tybalt/" if ' ' =~ /$tybalt/; test(); }

    successful /()/ --- run 5 0<> 1<> 2<> successful /|/ --- run 6 0<> 1<> 2<> successful /(?:)/ --- run 7 0<> 1<> 2<> successful /g{0}/ --- run 8 0<> 1<> 2<>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Thanks for testing!

      I'm confused about the result you are expecting... the empty pattern would always match for every position

      I want to allow users to enter any regex, I run it via eval, and it should behave "normally". The /g/ is just an example of something that my code was doing, which the user knows nothing about. So it seemed to me to be the most "normal" to have the regex behave as if it's the first regex in the program.

        I'm not sure if you should even allow the empty regex.

        The result surprised me and might not be intuitive for your user.

        Unless he has to learn Perl.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Reset meaning of empty pattern?
by tybalt89 (Monsignor) on Sep 01, 2018 at 12:14 UTC
      Can't test now, but probably this one has side effects by setting $1.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        That's why I posted the other ones.

        On the other hand, does it really matter? If the user does not expect $1 to be set by //, whether it has a value or is undef is irrelevant.

Re: Reset meaning of empty pattern?
by ikegami (Patriarch) on Sep 02, 2018 at 06:34 UTC

    You should accept a regex pattern as a parameter (as the variable name implies) instead of a fragment of Perl source code! This would avoid this and many other issues.