bichonfrise74 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm working on splitting an array into some pre-defined columns, but the last column is made of several strings which basically is anything left after the pre-defined columns have been identified.

Below is my script and it is working correctly, but I feel the code is clunky especially on lines 7 & 10 where I split the rows and I had to join the last 'column'.
Is there a way to optimize this further? Is there a way where I do not need to define each column when I'm splitting them up?
#!/usr/bin/perl use strict; while (<DATA>) { my @rows; my ($name, $tel, $col3, $col4, # lines 7 $col5, $col6, @notes) = split; push (@rows, $name, $tel, $col3, $col4, # lines 10 $col5, $col6, (join " ", @notes) ); print join " ", @rows , "\n"; } __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5

Replies are listed 'Best First'.
Re: Optimizing Splitting of Array
by lamprecht (Friar) on Jun 08, 2009 at 21:18 UTC
    Hi,

    there is a 'limit' arg to split (see perldoc -fsplit) this could do what you want:

    use warnings; use strict; use Data::Dumper; my @rows; while (<DATA>) { my @data = split(/ /,$_, 6); push (@rows, \@data); } print Dumper \@rows; __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5
    Cheers, Christoph
      The use of limit on the split is a great idea here!
      One suggestion concerns the use of split/[space]/: usually splitting on a single space is not what is needed (could be, but not often). What happens here is that abc has 2 spaces following it and this results in an extra null token in the output list (@data). This caused by the 2nd space after abc. Usually split(/\s+/,$_,6) would work out better (btw: default split() splits on \s+).

      Another technique is the use of list slice. There can be some good reasons to combine this with split limit. The below code shows how to "get rid of a value" from the split. In this case, the date token. You probably don't want to do that, but this is just an example.

      #!/usr/bin/perl -w use strict; use Data::Dumper; my @rows; while (<DATA>) { my @data = (split(/\s+/,$_, 6))[0,1,3..5]; push (@rows, \@data); } foreach (@rows) { print "@$_"; } #prints #abc 322 aaa aadda dasdas a1 a2 a3 #def 433 dasd bdbdbd wings b1 b2 b3 b4 b5 __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5
      As another point about "space splitting": If you split a complete line with \s+, the ending \n will be removed because the \s set includes \n\f\r\space\t (so when splitting a complete line with \s, you do not have to "chomp" it first).
      Thanks!
Re: Optimizing Splitting of Array
by Fletch (Bishop) on Jun 08, 2009 at 21:13 UTC

    Perhaps pass a limit to split explicitly telling it how many columns you expect?

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Optimizing Splitting of Array
by JavaFan (Canon) on Jun 08, 2009 at 21:22 UTC
    Considering that all your columns are treated equally, I don't understand why you put the first 7 columns in separate variables, and the rest into an array. Might as well put everything in an array (but see remark below).

    Since @rows is empty when you assign to it, no need to push onto it. You could write:

    my @rows = ($name, $tel, $col3, $col4, $col5, $col6, join " ", @notes) +;
    as well.

    But looking at the code, all it archieves is collapsing multiple whitespace into a single space. I'd write it as:

    while (<DATA>) { s/\s+$//; s/\s+/ /g; print $_, "\n"; }
    which should do the same. No splitting and joining needed.
Re: Optimizing Splitting of Array
by afoken (Chancellor) on Jun 08, 2009 at 21:13 UTC

    Use the third parameter (LIMIT) for split -- see perlfunc

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Optimizing Splitting of Array
by kcSkeptic (Initiate) on Jun 12, 2009 at 18:21 UTC
    Here's a quick refactoring of the code:
    #!/usr/local/bin/perl use strict; while (<DATA>) { chomp(); my @rows = split(/\s+/, $_, 7); print join ";", @rows , "\n"; } __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5
    Output:
    abc;322;2/3/09;aaa;aadda;dasdas;a1 a2 a3;
    def;433;3/4/08;dasd;bdbdbd;wings;b1 b2 b3 b4 b5;
    
    I usually like to use a hash table to hold the column names, so here's an example of how I'd handle named columns:
    #!/usr/local/bin/perl use strict; use Data::Dumper; # Named column headings my @columnNames = qw{ name tel col3 col4 col5 col6 notes }; while (<DATA>) { chomp(); my @rows = split(/\s+/, $_, scalar(@columnNames)); my %headings; @headings{@columnNames} = @rows; print "Dump headings hash: ".Dumper(\%headings)."\n"; } __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5
    Output:
    Dump headings hash: $VAR1 = {
              'col5' => 'aadda',
              'col3' => '2/3/09',
              'tel' => '322',
              'notes' => 'a1 a2 a3',
              'name' => 'abc',
              'col4' => 'aaa',
              'col6' => 'dasdas'
            };
    
    Dump headings hash: $VAR1 = {
              'col5' => 'bdbdbd',
              'col3' => '3/4/08',
              'tel' => '433',
              'notes' => 'b1 b2 b3 b4 b5',
              'name' => 'def',
              'col4' => 'dasd',
              'col6' => 'wings'
            };