note
ig
<p>The note is interesting: they are highlighting this difference between ECMA script REs and Perl REs.</p>
<p>The RE <c>(x+)?</c> is very similar to <c>(x*)</c>, except that the latter will always match (and, therefore, never have the value from a previous match if it is in an enclosing repeating group. This is similar to the requirement in Note 3: "Step 4 of the RepeatMatcher clears Atom's captures each time Atom is repeated." Because it always matches it always has a value from the last repeat of the outer repeating group, as if it was reset for each repeat, except that the value is <c>''</c> instead of <c>undef</c> in the case that <c>x</c> did not match. This is an easy transformation.</p>
<p>I appreciate that you don't want to change the RE but you say you are parsing it, so perhaps you can make some systematic transformations.</p>
<p>Consider:</p>
<c>
use strict;
use warnings;
use Data::Dumper::Concise;
my $string = "aacbbbcac";
my $re = '((a+)?(b+)?(c))*';
# transform '(x+)?' to '(x*)' assuming 'x' is monolithic
$re =~ s/\Q+)?/*)/g;
print "re = $re\n";
my $re1 = qr/$re/;
if ($string =~ $re1) {
my @something;
foreach (0..$#-) {
if(defined($-[$_])) {
my $substring = substr($string, $-[$_], $+[$_] - $-[$_]);
# ${$_} also works, except where $_ = 0
no strict 'refs';
print "\$substring = $substring = ${$_}\n";
# transform '' to undef
$substring = undef if($substring eq '');
# assert: $substring is now as specified by
# Standard ECMA-262, 5.1 Edition / June 2011
# Section 15.10.2.5 Note 3
printf "Group %d: <%s>\n", $_, $substring // '';
$something[$_] = $substring;
}
}
print Dumper(\@something);
}</c>
<p>Produces</p>
<c>
re = ((a*)(b*)(c))*
$substring = aacbbbcac = test.pl
Group 0: <aacbbbcac>
$substring = ac = ac
Group 1: <ac>
$substring = a = a
Group 2: <a>
$substring = =
Group 3: <>
$substring = c = c
Group 4: <c>
[
"aacbbbcac",
"ac",
"a",
undef,
"c"
]</c>
1060027
1060055