pbeckingham has asked for the wisdom of the Perl Monks concerning the following question:

I have a curious piece of code, with an error in it, and while I have found the error, I can't explain the erroneous behavior. Could someone help me understand this?

I have a regex that I'm using to split apart a string, which I have simplified for this example. The regex contains capturing parentheses, and the error is that I also include capturing parentheses when I use the regex later.

#! /usr/bin/perl -w use strict; my $r = qr{(.)}; while ('abc' =~ /($r)/g) { print "loop $1\n"; } foreach ('abc' =~ /($r)/g) { print "array $_\n"; }

The output is:

loop a loop b loop c array a array a array b array b array c array c

Removing either set of capturing parentheses fixes the problem, but I don't see how the while loop doesn't also get duplicate results?

Replies are listed 'Best First'.
Re: Double Capturing Parentheses
by Enlil (Parson) on Mar 22, 2004 at 03:08 UTC
    The difference is that the while construct will continue to loop so long as the condition is true. Which in this case is once for everytime the regex is true. So in this case 3 times. On the other hand the for loop will construct its list and then go through once for each item in list. In this case the list is constructed of the items in $1, and $2 (because of the two capturing parens.) each time, for a total of six items, and hence the difference in the results.

    This can clearly be shown (when the regex execute and how many times in each loop) if you add something like use re 'debug'; at the top of you code.

    -enlil

Re: Double Capturing Parentheses
by bart (Canon) on Mar 22, 2004 at 08:19 UTC
    Let me try to make it clear using another example...

    Take the regexp /(\w)(\w)/, and the string "same old show".

    $_ = "same old show"; # scalar, one at a time: while(/(\w)(\w)/g) { print "while: $1 (+ $2)\n"; } # list context, one go @list = /(\w)(\w)/g; print "List: @list\n"; # list context, foreach loop foreach(/(\w)(\w)/g) { print "foreach: $_\n"; }
    Result:
    while: s (+ a)
    while: m (+ e)
    while: o (+ l)
    while: s (+ h)
    while: o (+ w)
    List: s a m e o l s h o w
    foreach: s
    foreach: a
    foreach: m
    foreach: e
    foreach: o
    foreach: l
    foreach: s
    foreach: h
    foreach: o
    foreach: w
    
    As you can see, the foreach produces twice as much output as the while. That is because each match returns a list of two items, $1 and $2, and the regexp in list context returns a (flattened) list of such lists, as you can see in the example in the middle. foreach uses this grand, flattened list to loop through, thus for each each match you see a loop for $1, and next one for $2.

    OTOH the while takes one pair of matched items each time, thus it'll loop half as many times.

    In your particular example, the regexp is /((.))/. You still have a $1 and a $2, even though they capture the same thing — but you ignore $2. That's why you don't see it appear. Modifying my code to match your regexp and string, I get:

    while: a (+ a)
    while: b (+ b)
    while: c (+ c)
    List: a a b b c c
    foreach: a
    foreach: a
    foreach: b
    foreach: b
    foreach: c
    foreach: c
    
    which matches your result.

    In summary: you simply ignored to output the contents of $2 in your first snippet, the while loop.

Re: Double Capturing Parentheses
by eXile (Priest) on Mar 22, 2004 at 03:05 UTC
    I didn't understand either, and I'm lazy so:
    use YAPE::Regex::Explain; my $regex=YAPE::Regex::Explain->new(qr/((.))/); print $regex->explain; __END__ The regular expression: (?-imsx:((.))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    so your foreach makes this is evaluated in list-context and \1 and \2 are passed.