binesh_28 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a bit of confusion understanding the result from using a split function with '+' operator

Here's my code. @fields =split /(a|b)+/, "a12cdabab"; for my $val(@fields) { print "Value:$val\n"; }

This prints the answer as : Value: Value:a Value:12cd Value:b Can someone please explain how perl produces this output?

Replies are listed 'Best First'.
Re: Split operator in perl
by BrowserUk (Patriarch) on Dec 06, 2010 at 06:22 UTC

    Because your regex contains capturing parens, split will return both the separated bits and the captured separators:

    print split ',', 'a,b,c';; a b c print split '(,)', 'a,b,c';; a , b , c

    However, as you have a quantifier applied to the capturing parens, the regex will match multiple consecutive characters matching (a|b)+ but will only retain the last one captured. Hence although the regex matches & captures each of the four characters in 'abab', only the final 'b' will be retained and returned.

    print 'abcd' =~ m[(.)+];; d

    Putting that all together, and the delimiters used to split the string are as indicated by '[]': "[a]12cd[abab]"; but only the last single character of each delimiter is retained, hence:'(a)', '12cd', '(b)'


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Split operator in perl
by ikegami (Patriarch) on Dec 06, 2010 at 06:11 UTC

    Using a quantifier (+,*,?,{m,n}) on a capture rarely makes sense. If you didn't want to capture: /(?:a|b)+/ or /[ab]+/. If you wanted to capture the whole separator: /((?:a|b)+)/ or /([ab]+)/.

    Then there's the fact that your separator is probably not really a separator since one a matching pattern is found at the start of the string. This will cause you to have a leading empty field.

    What are you expecting for output?

      I was pretty confused trying to understand the results from that operation. I wasn't trying to accomplish any other task. That was one of the examples quoted as part of perl docs, and as a beginner to perl, I was having a hard time trying to figure out the results. Thanks for your help.

        As a beginner to Perl, maybe you shouldn't worry about what weird results split will give if you give it weird inputs. I bet most Perl experts would have to guess at what 'ab'=~/(a|b)+/ returns (although they would likely guess correctly). More important would be to learn how to use split properly.

        • The regular expressions should identify the separator. In your case, it doesn't appear to be used to match a separator, so split is probably not the best tool for the job.

        • You'll rarely want to use captures in a split pattern. When you do, what the captures match is returned in split's result.

        • Using a quantifier on a capture doesn't make sense. If you want to group things in patterns, use (?:...) instead of (...). The latter is much slower since it remembers what the parens matched, and has side effects in split.

Re: Split operator in perl
by perl_lover (Chaplain) on Dec 06, 2010 at 06:20 UTC
    If you want to understand how the regex execution is happening, just add this line to your code
    use re 'debug';


    use Perl;
    Perl4Everything
Re: Split operator in perl
by suhailck (Friar) on Dec 06, 2010 at 06:21 UTC
    See the below examples and find the differences

    Case 1:
    perl -e 'print map {++$count," : ",$_,"\n"}split /(a|b)+/,"a12cdabab"' 1 : 2 : a 3 : 12cd 4 : b


    Case 2:
    perl -e 'print map {++$count," : ",$_,"\n"}split /((?:a|b)+)/,"a12cdab +ab"' 1 : 2 : a 3 : 12cd 4 : abab


    Case 3:
    perl -e 'print map {++$count," : ",$_,"\n"}split /(?:a|b)+/,"a12cdabab +"' 1 : 2 : 12cd