gman has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I have a simple question regarding the behavior of split. Given a string following the pattern /w{2}.{4}/ I am getting an extra variable from split. I'm sure this is the correct behavior, I'm just not understanding why so can't correct it. see code:

#!/usr/bin/perl -w use strict; use Data::Dumper; my $string = "df5434vg7856fg3472sd1234jh45r5"; my @array = split(/(\w{2}.{4})/,$string); print Dumper(@array);

Output:

$VAR1 = ''; $VAR2 = 'df5434'; $VAR3 = ''; $VAR4 = 'vg7856'; $VAR5 = ''; $VAR6 = 'fg3472'; $VAR7 = ''; $VAR8 = 'sd1234'; $VAR9 = ''; $VAR10 = 'jh45r5';

Thanks in advance!

Thank You for the quick solution and explanation!

Replies are listed 'Best First'.
Re: split() behavior
by BrowserUk (Patriarch) on Jan 20, 2011 at 04:40 UTC

    Normally, split returns the bits either side of the delimiter, but not the delimiter itself. You've overridden that behaviour by adding capturing parens around the delimiter. Hence, it treats your input string as consisting of a sequence of null strings separated by 6 character delimiters which you've also asked it to capture:

    (null)df5434(null)vg7856(null)fg3472(null)sd1234(null)jh45r5";

    Basically, you're using the wrong tool for the job. m// does what you want:

    print for "df5434vg7856fg3472sd1234jh45r5" =~ m[(\w{2}.{4})]g;; df5434 vg7856 fg3472 sd1234 jh45r5

    But actually, the most efficient way of breaking a string into fixed length fields is unpack:

    print for unpack '(a6)*', "df5434vg7856fg3472sd1234jh45r5";; df5434 vg7856 fg3472 sd1234 jh45r5

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: split() behavior
by PeterPeiGuo (Hermit) on Jan 20, 2011 at 04:34 UTC

    "If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter."

    You can find the above in perldoc for split. There is also one example in the perldoc showing this behavior, you can check it out.

    Peter (Guo) Pei