GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

In the process of trying to benchmark some regex code I found some odd behaviour. The code below should generate two lines of text, but only generates one. The second time through, the regex doesn't seem to generate any matches!

use warnings; use strict; gf (); gf (); sub gf { my $ins1 = '4 A -4 C -4 B 1 D'; my @inserts; $ins1 =~ /[\d+-]+(?: \w ([\d+-]+)(?{push @inserts, $1}))*/g; print "\n" . join ' ', @inserts; }

Prints

-4 -4 1

Is this a bug in the "use at own risk" code evaluation extensions to the regex engine, or something silly I've missed?

I'm using Active State Perl 5.8.7


Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re: Irregular expression evaluation
by pg (Canon) on Oct 24, 2005 at 03:17 UTC

    A hint of what is going on:

    use warnings; use strict; use Data::Dumper; gf(); gf(); sub gf { my $ins1 = '4 A -4 C -4 B 1 D'; my @inserts; print "\nat the beginning: " . \@inserts . "\n"; print $ins1, "\n"; $ins1 =~ /[\d+-]+(?: \w ([\d+-]+)(?{print \@inserts, "\n"; push @i +nserts, $1}))*/g; print "\n" . join ' ', @inserts; }

    This prints:

    at the beginning: ARRAY(0x189126c) 4 A -4 C -4 B 1 D ARRAY(0x189126c) ARRAY(0x189126c) ARRAY(0x189126c) -4 -4 1 at the beginning: ARRAY(0x224fc8) 4 A -4 C -4 B 1 D ARRAY(0x189126c) ARRAY(0x189126c) ARRAY(0x189126c)

    The push for the second pass pushed everything into the first array.

Re: Irregular expression evaluation
by chester (Hermit) on Oct 24, 2005 at 03:24 UTC
    I remember reading something in Mastering Regular Expressions warning against mixing lexicals with (?{}) (though I may be mistaken). Note that this appears to work:

    use warnings; use strict; gf (); gf (); sub gf { my $ins1 = '4 A -4 C -4 B 1 D'; our @inserts = (); $ins1 =~ /[\d+-]+(?: \w ([\d+-]+)(?{push @inserts, $1}))*/g; print "\n" . join ' ', @inserts; }

      You are right. The binding was resolved at compiling time. There was a deep binding of @inserts inside that anonymous subroutine. That binding lasts beyond the scope of @inserts.

      Note that you can also work around it by declaring a lexical outside the scope of the sub:
      { my @inserts; sub gf { @inserts = (); my $ins1 = '4 A -4 C -4 B 1 D'; print "\nat the beginning: " . \@inserts . "\n"; print $ins1, "\n"; my $re = qr/[\d+-]+(?: \w ([\d+-]+)(?{print \@inserts, "\n"; p +ush @inserts, $1}))*/; $ins1 =~ /$re/g; print "\n" . join ' ', @inserts; } }
      You might notice that I created a regex variable. I had hoped that that would cause the regex to be bound at runtime, but no such luck. Creating an empty variable and inserting it in the regex does force runtime evaluation (at least inasmuch as it demands use of use re 'eval'), but it still doesn't make the (?{}) section use the current incarnation of @inserts.

      Caution: Contents may have been coded under pressure.
Re: Irregular expression evaluation
by sauoq (Abbot) on Oct 24, 2005 at 04:14 UTC

    As pg demonstrates, you have created an inadvertent closure. This issue has come up before. I'm wondering if it might be appropriate for perl to issue a warning here. It can be a nasty bug.

    -sauoq
    "My two cents aren't worth a dime.";
    

      Fully agree with you and there better be a warning.

      Closure was meant to be a feature, so that people can use it when they need it (and they KNOW that they are using it). But so often people just get trapped without knowing that there is a closure in their code, and it becomes very tricky to application programmers.

      A warning in this case will be very helpful.

        dave_the_m has been working on the closure code so its quite possible this warns in blead. I know he has made improvements in that area.

        ---
        $world=~s/war/peace/g

Re: Irregular expression evaluation
by neversaint (Deacon) on Oct 24, 2005 at 03:43 UTC
    Dear GrandFather,

    This is not an answer to your posting, just an observation. Since your script above is one of the significant solution to my earlier posting.

    Thanks so much for your answer to my posting the other day. In fact I also encounter the same problem you mentioned above. At times that particular regex did not return the number-inserted strings.

    ---
    neversaint and everlastingly indebted.......
Re: Irregular expression evaluation
by sfink (Deacon) on Oct 25, 2005 at 00:49 UTC
    This isn't at all related to your actual problem, but what's the /g for? The * is already getting all of your matches.

    And you probably have this already if you're benchmarking, but I suspect /g probably is the right way to go:

    use warnings; use strict; gf (); gf (); sub gf { my $ins1 = '4 A -4 C -4 B 1 D'; my @inserts; $ins1 =~ /^[\d+-]+/g; push @inserts, $ins1 =~ /\G(?: \w ([\d+-]+))/g; print join(' ', @inserts), "\n"; }

    I work for Reactrix Systems, and am willing to admit it.

      Result of evolving code I suspect. Think of it as an appendix - most of the time it does no harm, but it has no obvious use. :)


      Perl is Huffman encoded by design.