in reply to Extracting the number of repetitions from a regex

Lexicals don't work (right) in regex code blocks. If you switch to using globals, you'll get the results you are after:

my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); for my $string ( @strings ) { our $a_counter = 0; our $b_counter = 0; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n" if $string =~ /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/; } __END__ C:\test>junk In aaabbbb there were 3 'a's and 4 'b's. In ab there were 1 'a's and 1 'b's. In abb there were 1 'a's and 2 'b's. In aabb there were 2 'a's and 2 'b's. In aaabb there were 3 'a's and 2 'b's. In aabbb there were 2 'a's and 3 'b's.

There is a warning about this in one of the regex pods.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Extracting the number of repetitions from a regex
by pat_mc (Pilgrim) on Dec 18, 2008 at 20:40 UTC
    Cool stuff, BrowserUk!

    This really does the trick. Since I am not really clear on the difference between 'Lexicals' and 'Globals' - could you please briefly expand on how the change from declaring the variables with my to our affects this scenario? I expected the scope of a variable definition to be that of the block is was defined in ... hence my confusion as to why the use of globals would make any difference here.

    Thanks again - Pat

      could you please briefly expand on how the change from declaring the variables with my to our affects this scenario?

      Briefly? Code blocks in regexp patterns capture lexical variabless when they are compiled, just like anonymous subs. Package variables aren't captured. In case that didn't do the trick, the longer answer follows.

      Code blocks in regexps are anonymous subs.

      sub f { my ($x) = @_; '' =~ /(?{ print "$x\n" })/; } f(4); # 4 f(5); # 4!!
      effectively does
      sub f { my ($x) = @_; $block ||= sub { print "$x\n" }; $block->(); } f(4); # 4 f(5); # 4!!

      The $x from the first pass is captured when the sub is compiled. It's a very powerful feature which allows the simplification of many problems. For example,

      BEGIN { package Prefixer; sub new { my ($class, $prefix) = @_; return bless({ prefix => $prefix }, $class); } sub prefix { my ($self) = @_; return join '', $self->{prefix}, @_; } } my $a_prefixer = Prefixer->new('a'); my $b_prefixer = Prefixer->new('b'); print("$_\n") for $a_prefixer->prefix('1'), # a1 $a_prefixer->prefix('2'), # a2 $b_prefixer->prefix('3'), # b3 $b_prefixer->prefix('4'); # b4

      can be simplified to

      sub make_prefixer { my ($prefix) = @_; return sub { return join '', $prefix, @_ }; } my $a_prefixer = make_prefixer('a'); my $b_prefixer = make_prefixer('b'); print("$_\n") for $a_prefixer->('1'), # a1 $a_prefixer->('2'), # a2 $b_prefixer->('3'), # b3 $b_prefixer->('4'); # b4

      However, subs only capture lexical variables, not package variables. By using package variables, the problem goes away.

      sub f { local our ($x) = @_; $block ||= sub { print "$x\n" }; $block->(); } f(4); # 4 f(5); # 5
      sub f { local our ($x) = @_; '' =~ /(?{ print "$x\n" })/; } f(4); # 4 f(5); # 5
        Code blocks in regexps are anonymous subs.
        Wow - ikegami ...

        It's all making sense to me now. Thanks for the enlightenment! If calling a code block in a regex is effectively the same as calling a subroutine then I understand why the scope of the my variable in my initial code snipped is not wide enough!

        Your wisdom and taking the time to share it are much appreciated.

        Cheers - Pat