matthewshark has asked for the wisdom of the Perl Monks concerning the following question:

I (very hesitantly) say that I think I've found a bug in nested named capture buffers. I've written a minimal sample (see below) that illustrates it. Basically, after the regexp matches, "keys %+" does not return the names of inner-nested capture buffers. However, referencing a specific key, such as $+{inner}, returns the captured value. Any help with either (1) understanding what I'm doing wrong, or (2) how to work around this, would be much appreciated. The code below illustrates the problem. I've listed both the code and its output. I ran this on a MacBook using ActiveState Perl:
mhaines@patru ~/rtx/pshark/cgi-bin/Pshark/AddressParser $ perl -v
This is perl, v5.10.0 built for x86_64-linux-thread-multi-ld
The code:
#!/usr/bin/perl
use strict;

my $inner = qr/(?<inner>inner)/xi;
my $tube = qr/(?<tube>tube)/xi;

my $outer1 = qr/(?<donut>$inner)tube/xi; # $+{inner} set, 'keys %+' doesn't have 'inner'
my $outer2 = qr/(?<donut>$inner $tube)/xi; # same here
my $outer3 = qr/(?<donut>$inner) $tube/xi; # finally %+ has all three keys

&test($outer1, "innertube");
&test($outer2, "innertube");
&test($outer3, "innertube");

sub test {
	my ($regexp, $string) = @_;
	$string =~ $regexp;
	print "\n";
	print "Regexp: $regexp\n";
	print "\tagainst $string\n";
	print "Each:\n";
	while (my ($k, $v) = each %+) {
		print "\t$k = '$v'\n";
	}
	print "Inner: $+{inner}\n";
	print "Tube:  $+{tube}\n";
	print "1: $1\n";
	print "2: $2\n";
	print "3: $3\n";
}
And the output:
Regexp: (?ix-sm:(?<donut>(?ix-sm:(?<inner>inner)))tube)
	against innertube
Each:
	donut = 'inner'
Inner: inner
Tube:  
1: inner
2: inner
3: 

Regexp: (?ix-sm:(?<donut>(?ix-sm:(?<inner>inner)) (?ix-sm:(?<tube>tube))))
	against innertube
Each:
	donut = 'innertube'  <-- Hey!  What about 'inner'???
Inner: inner
Tube:  tube
1: innertube
2: inner
3: tube

Regexp: (?ix-sm:(?<donut>(?ix-sm:(?<inner>inner))) (?ix-sm:(?<tube>tube)))
	against innertube
Each:
	inner = 'inner'
	tube = 'tube'
	donut = 'inner'
Inner: inner
Tube:  tube
1: inner
2: inner
3: tube

Replies are listed 'Best First'.
Re: Bug with nested named capture buffers
by ikegami (Patriarch) on Aug 30, 2008 at 09:26 UTC

    Sounds like a bug to me too. Why don't you report it?

    Update: Wait, don't. Perl bug #58082:

    The problem has been noticed before, and is currently fixed in 'bleadperl', and will be available in 5.10.1.

      Thanks for the help! The bug report contained a hint that led to a trivial workaround. Namely, that %- functions correctly, while %+ has problems. From %- you can implement %+ with something like this:
      my %plus = ();
      while (my ($k, $v) = each %-) {
          $plus{$k} = $v->[0] if scalar(@$v)>=1;
      }
      

        Great! Two quick tips:

        if scalar(@$v)>=1;
        can be written as
        if @$v;

        If you could please avoid <pre>...</pre> on PerlMonks. <c>...</c> is preferred. It also saves you from encoding "&", "<" and ">"