... I found '\G' is usable, but ... it returns an extra pair of matches that contain undef ...
This has nothing to do with the \G assertion, but is a facet of the way unmatched capture groups behave in list context when allowed to match zero times. Consider:
Both of the variations above, with and without the \G assertionc:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["c", "d", "E", "F", undef, undef] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l", undef, undef] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [undef, undef] 'a--b' -> ["a", "b", undef, undef] 'c--dE--F' -> ["E", "F", undef, undef] 'g--hI--Jk--l' -> ["k", "l", undef, undef]
Don'cha just love regexes? Play with variations of these patterns (including qr[$qr_string*] and qr[$qr_string+]) for deeper confu... um, greater enlightenment.c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:\G(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"] c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))); my $qr = qr[$qr_string]; ;; for my $s ('', qw(a--b c--dE--F g--hI--Jk--l)) { my @caps = $s =~ /$qr/g; print qq{'$s' -> }, pp \@caps; } " '' -> [] 'a--b' -> ["a", "b"] 'c--dE--F' -> ["c", "d", "E", "F"] 'g--hI--Jk--l' -> ["g", "h", "I", "J", "k", "l"]
So what's going on? Here's how I would describe it: If the (?:...(...)...(...)) group containing two capture groups is allowed to match zero times at some point, e.g., the end of the string, it will! However, the capture groups inside it don't actually capture anything, so they return undef.
Compare that behavior to unmatched capture groups in an alternation:
c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $s = 'aBcDeFg'; ;; my @captures = $s =~ m{ (B) | (D) | (F) }xmsg; dd \@captures; " ["B", undef, undef, undef, "D", undef, undef, undef, "F"]
Also consider:
c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string+]; ;; my $s = '%%%%'; print 'MATCH!!!' if $s =~ /$qr/g; dd \@-; ;; my @captures = $s =~ /$qr/g; dd \@captures; " MATCH!!! [0] [undef, undef, undef, undef, undef, undef, undef, undef]
Update: In place of the last example, consider instead:
For discussion of $-[0], please see @- in perlvar. Also note that the definitionc:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; my $qr_string = q((?:(\w)\W{2}(\w))*); my $qr = qr[$qr_string]; ;; my $s = '%%%%'; ;; print 'match @ offset ', $-[0], ' ($1, $2)==', pp $1, $2 while $s = +~ /$qr/g; ;; my @captures = $s =~ /$qr/g; pp \@captures; " match @ offset 0 ($1, $2)==(undef, undef) match @ offset 1 ($1, $2)==(undef, undef) match @ offset 2 ($1, $2)==(undef, undef) match @ offset 3 ($1, $2)==(undef, undef) match @ offset 4 ($1, $2)==(undef, undef) [undef, undef, undef, undef, undef, undef, undef, undef, undef, undef]
Give a man a fish: <%-{-{-{-<
In reply to Re^3: 'g' flag w/'qr'
by AnomalousMonk
in thread 'g' flag w/'qr'
by perl-diddler
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |