Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I got the following at
"http://perldoc.perl.org/perlretut.html#Non-capturing-groupings" regexp tutorial, in the Non-capturing groupings section:

$x = '12aba34ba5';
@num = split /(a|b)+/, $x; # @num = ('12','a','34','b','5')
@num = split /(?:a|b)+/, $x; # @num = ('12','34','5')

Well...
I can't quite work my way around the first "split" operation.
Also, it appears that @num = ('12', 'a', '34', 'a', '5'), i.e., 'a' as the 4th element, and not 'b', as specified.
I'd really appreciate it, if someone could perhaps explain how the above 'split' operation actually works.
Thank you!

Replies are listed 'Best First'.
Re: Regexp - groupings
by hexcoder (Curate) on Aug 02, 2008 at 21:18 UTC
    The first split shall use /(a|b)+/ as a delimiter.

    From the first matching substring 'aba' in $x the first matching character is 'a', which is captured and goes into the result array. The other characters 'ba' are also part of the delimiter, but not captured, so they don't go into the result array. If you expected '+' to multiply the captures in one delimiter, you were wrong. All captures are determined during the compilation of the regex.

    The second matching substring is 'ba'. Again the first character is captured ('b'), and the following 'a' is ignored.

    The second split does not capure anything from the delimiter, so 'a' and 'b' from the result of the first split are not present here.

Re: Regexp - groupings
by harleypig (Monk) on Aug 02, 2008 at 23:14 UTC
    I get the following:
    DB<1> x split /(a|b)+/, '12aba34ba5' 0 12 1 'a' 2 34 3 'a' 4 5
    If I change that to:
    DB<8> x split /(a|b|c|d|e)+/, '12abc34de5' 0 12 1 'c' 2 34 3 'e' 4 5
    So, it's only capturing the last delimiter. If you do a regex the same way you get the same thing:
    DB<12> x '12abc34de5' =~ /(a|b|c|d|e)+/g 0 'c' 1 'e'
    Interesting. I'm not sure why. Maybe it has something to do with a list being returned in a scalar context (the last element is captured that way). If you want to capture the whole delimiter you need to put the + inside the parens:
    DB<16> x split /([abcde]+)/, '12abc34de5' 0 12 1 'abc' 2 34 3 'de' 4 5
    Harley J Pig
Re: Regexp - groupings
by eosbuddy (Scribe) on Aug 03, 2008 at 00:54 UTC
    The split with parenthesis: () captures the elements within. You might perhaps be able to better appreciate the difference by looking at an alternate form of the same expression and operation as given below: (please notice the presence and absence of () in the split commands.
    #!/usr/bin/perl use strict; use warnings; my $x = '12aba34ba5'; my @num = split/[a-b,A-Z]/, $x; print "$num[$_],\n" foreach (0..$#num); print"-------\n"; @num = split/([a-b,A-Z])/, $x; print "$num[$_],\n" foreach (0..$#num);
    Update: the (?:) makes sure that this is not captured so if you try out:
    @num = split/(?:[a-b,A-Z])/, $x; print "$num[$_],\n" foreach (0..$#num);
    it will spit out the same output as:
    @num = split/[a-b,A-Z]/, $x; print "$num[$_],\n" foreach (0..$#num);
    (apologies for the length of this reply).
Re: Regexp - groupings
by ikegami (Patriarch) on Aug 03, 2008 at 05:26 UTC

    Using *, + and similar on a capture doesn't make much sense, and therefore neither does the result. If you wished to capture the separator, you should have used

    @num = split /((?:a|b)+)/, $x; # @num = ('12','aba','34','ba','5')