yegg has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl my @strings = ('testing','','testing','testing'); foreach my $string (@strings) { print "\nSTRING: $string\n"; my $string2 = 'http'; $string2 =~ /^h(.*)/o; my $q = $string; my $test = 'testing'; warn $test =~ /\Q$q\E/; }

I finally got to the bottom of the problem I reported at http://www.perlmonks.org/?node_id=867059 (thx to everyone who commented), and I distilled it to the above test case.

Here's what I think is happening. If you have a an /o regex with a capture clause and it captures something, and then you run a a second regex with a variable, but it set to '', then it uses the first compiled regex instead (and thereafter), and doesn't match as you'd expect as a result.

In this example, You'd expect line 14 to warn 1 then nothing, then 1 twice. Instead, you get 1, then nothing the rest of the time.

I'm running v5.8.9 on FreeBSD 7.0. FWIW, I also filed a perlbug here: http://rt.perl.org/rt3/Ticket/Display.html?id=78564

Replies are listed 'Best First'.
Re: Perl bug or feature?
by Corion (Patriarch) on Oct 25, 2010 at 17:10 UTC

    It's not a bug in Perl but a bug in your expectations. The empty RE // always matches, see perlre resp. perlop. And the first match has nothing to do with it, as removing it still produces all matches:

    #!/usr/bin/perl my @strings = ('testing','','testing','testing'); foreach my $string (@strings) { my $q = $string; my $test = 'testing'; warn "$test =~ /$q/?"; warn $test =~ /\Q$q\E/; } __END__ testing =~ /testing/? at tmp.pl line 8. 1 at tmp.pl line 9. testing =~ //? at tmp.pl line 8. 1 at tmp.pl line 9. testing =~ /testing/? at tmp.pl line 8. 1 at tmp.pl line 9. testing =~ /testing/? at tmp.pl line 8. 1 at tmp.pl line 9.

    Update: Actually, while my reply is not untrue, the real cause is told by Anonymous Monk below. The empty match will repeat the last match.

      Actually, the empty regexp, // is special. See perlre, resp. perlop.
      say "foo" =~ /oo/ ? "MATCH" : "NO MATCH"; say "bar" =~ // ? "MATCH" : "NO MATCH"; MATCH NO MATCH
      An empty pattern repeats the last successful match.
Re: Perl bug or feature?
by Anonymous Monk on Oct 25, 2010 at 17:19 UTC
      Even though I knew about — and respect — perl's behaviour to repeat the match with an empty regex //, I do feel that this should not apply to the case where the empty regex is a result of interpolating an empty variable: $re = ''; /$re/

      Usually, the variable's data comes from user input, and that should not fall back to a shortcut which is intended for intra-source only.

Re: Perl bug or feature?
by JavaFan (Canon) on Oct 25, 2010 at 17:27 UTC
    I think this is indeed a bug. In fact, it's very easy to see it's a bug - the /o on the first match is redundant (as there's no variable in the pattern), yet removing the /o changes the outcome.

    I can confirm this bug is present in 5.12.2 as well.

    Running the code with re debugging clearly shows the bug:

    Compiling REx "^h(.*)" Final program: 1: BOL (2) 2: EXACT <h> (4) 4: OPEN1 (6) 6: STAR (8) 7: REG_ANY (0) 8: CLOSE1 (10) 10: END (0) anchored "h" at 0 (checking anchored) anchored(BOL) minlen 1 Guessing start of match in sv for REx "^h(.*)" against "http" Guessed: match at offset 0 Matching REx "^h(.*)" against "http" 0 <> <http> | 1:BOL(2) 0 <> <http> | 2:EXACT <h>(4) 1 <h> <ttp> | 4:OPEN1(6) 1 <h> <ttp> | 6:STAR(8) REG_ANY can match 3 times out of 214 +7483647... 4 <http> <> | 8: CLOSE1(10) 4 <http> <> | 10: END(0) Match successful! 'testing' =~ /testing/ at w line 14. Compiling REx "testing" Final program: 1: EXACT <testing> (4) 4: END (0) anchored "testing" at 0 (checking anchored isall) minlen 7 Guessing start of match in sv for REx "testing" against "testing" Found anchored substr "testing" at offset 0... Guessed: match at offset 0 1 at w line 15. Guessing start of match in sv for REx "^h(.*)" against "http" Guessed: match at offset 0 Matching REx "^h(.*)" against "http" 0 <> <http> | 1:BOL(2) 0 <> <http> | 2:EXACT <h>(4) 1 <h> <ttp> | 4:OPEN1(6) 1 <h> <ttp> | 6:STAR(8) REG_ANY can match 3 times out of 214 +7483647... 4 <http> <> | 8: CLOSE1(10) 4 <http> <> | 10: END(0) Match successful! 'testing' =~ // at w line 14. Freeing REx: "testing" Compiling REx "" Final program: 1: NOTHING (2) 2: END (0) minlen 0 Guessing start of match in sv for REx "^h(.*)" against "testing" String not equal... Match rejected by optimizer Warning: something's wrong at w line 15. Guessing start of match in sv for REx "^h(.*)" against "http" Guessed: match at offset 0 Matching REx "^h(.*)" against "http" 0 <> <http> | 1:BOL(2) 0 <> <http> | 2:EXACT <h>(4) 1 <h> <ttp> | 4:OPEN1(6) 1 <h> <ttp> | 6:STAR(8) REG_ANY can match 3 times out of 214 +7483647... 4 <http> <> | 8: CLOSE1(10) 4 <http> <> | 10: END(0) Match successful! 'testing' =~ /testing/ at w line 14. Guessing start of match in sv for REx "^h(.*)" against "testing" String not equal... Match rejected by optimizer Warning: something's wrong at w line 15. Guessing start of match in sv for REx "^h(.*)" against "http" Guessed: match at offset 0 Matching REx "^h(.*)" against "http" 0 <> <http> | 1:BOL(2) 0 <> <http> | 2:EXACT <h>(4) 1 <h> <ttp> | 4:OPEN1(6) 1 <h> <ttp> | 6:STAR(8) REG_ANY can match 3 times out of 214 +7483647... 4 <http> <> | 8: CLOSE1(10) 4 <http> <> | 10: END(0) Match successful! 'testing' =~ /testing/ at w line 14. Guessing start of match in sv for REx "^h(.*)" against "testing" String not equal... Match rejected by optimizer Warning: something's wrong at w line 15. Freeing REx: "^h(.*)" Freeing REx: ""
    There's no way the last match should be /^h(.*)/.
Re: Perl bug or feature?
by AnomalousMonk (Archbishop) on Oct 25, 2010 at 19:33 UTC

    With either
        $string2 =~ /^h(.*)/o;
    or
        $string2 =~ /^h(.*)/;
    (i.e., either with or without the  /o regex modifier), I get the same results for ActiveState 5.8.9 and Strawberries 5.10.1 and 5.12.0 (all running under Windoze 7) and as I would expect (after a little thought) given the documented behavior of the  // empty regex:

    >perl -wMstrict -le "my @strings = ('testing','','testing','testing'); foreach my $string (@strings) { print \"\nSTRING: $string\n\"; my $string2 = 'http'; $string2 =~ /^h(.*)/o; my $q = $string; my $test = 'testing'; warn $test =~ /\Q$q\E/; } " STRING: testing 1 at -e line 1. STRING: Warning: something's wrong at -e line 1. STRING: testing 1 at -e line 1. STRING: testing 1 at -e line 1.
Re: Perl bug or feature?
by ig (Vicar) on Oct 26, 2010 at 04:51 UTC

    While initially I found it surprising, after reviewing Quote and Quote like Operators and a few tests with use re qw(debug All), I came to the conclusion that it is merely not obvious and wouldn't be inclined to call it a bug. Some examples in perlop to clarify what empty means might be helpful, and sufficient from my point of view.