Optimizing Splitting of Array

bichonfrise74 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimizing Splitting of Array by lamprecht (Friar) on Jun 08, 2009 at 21:18 UTC
Hi, there is a 'limit' arg to split `(see perldoc -fsplit)` this could do what you want: `use warnings; use strict; use Data::Dumper; my @rows; while (<DATA>) { my @data = split(/ /,$_, 6); push (@rows, \@data); } print Dumper \@rows; __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5` [download] Cheers, Christoph	[reply] [d/l] [select]
Re^2: Optimizing Splitting of Array by Marshall (Canon) on Jun 09, 2009 at 05:31 UTC
The use of limit on the split is a great idea here! One suggestion concerns the use of `split/[space]/`: usually splitting on a single space is not what is needed (could be, but not often). What happens here is that abc has 2 spaces following it and this results in an extra null token in the output list (@data). This caused by the 2nd space after abc. Usually split(/\s+/,$_,6) would work out better (btw: default split() splits on \s+). Another technique is the use of list slice. There can be some good reasons to combine this with split limit. The below code shows how to "get rid of a value" from the split. In this case, the date token. You probably don't want to do that, but this is just an example. `#!/usr/bin/perl -w use strict; use Data::Dumper; my @rows; while (<DATA>) { my @data = (split(/\s+/,$_, 6))[0,1,3..5]; push (@rows, \@data); } foreach (@rows) { print "@$_"; } #prints #abc 322 aaa aadda dasdas a1 a2 a3 #def 433 dasd bdbdbd wings b1 b2 b3 b4 b5 __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5` [download] As another point about "space splitting": If you split a complete line with \s+, the ending \n will be removed because the \s set includes \n\f\r\space\t (so when splitting a complete line with \s, you do not have to "chomp" it first).	[reply] [d/l] [select]
Re^2: Optimizing Splitting of Array by bichonfrise74 (Vicar) on Jun 08, 2009 at 21:39 UTC
Thanks!	[reply]
Re: Optimizing Splitting of Array by Fletch (Bishop) on Jun 08, 2009 at 21:13 UTC
Perhaps pass a limit to split explicitly telling it how many columns you expect? The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re: Optimizing Splitting of Array by JavaFan (Canon) on Jun 08, 2009 at 21:22 UTC
Considering that all your columns are treated equally, I don't understand why you put the first 7 columns in separate variables, and the rest into an array. Might as well put everything in an array (but see remark below). Since @rows is empty when you assign to it, no need to push onto it. You could write: `my @rows = ($name, $tel, $col3, $col4, $col5, $col6, join " ", @notes) +;` [download] as well. But looking at the code, all it archieves is collapsing multiple whitespace into a single space. I'd write it as: `while (<DATA>) { s/\s+$//; s/\s+/ /g; print $_, "\n"; }` [download] which should do the same. No splitting and joining needed.	[reply] [d/l] [select]
Re: Optimizing Splitting of Array by afoken (Chancellor) on Jun 08, 2009 at 21:13 UTC
Use the third parameter (LIMIT) for split -- see perlfunc Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: Optimizing Splitting of Array by kcSkeptic (Initiate) on Jun 12, 2009 at 18:21 UTC
Here's a quick refactoring of the code: `#!/usr/local/bin/perl use strict; while (<DATA>) { chomp(); my @rows = split(/\s+/, $_, 7); print join ";", @rows , "\n"; } __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5` [download] Output: abc;322;2/3/09;aaa;aadda;dasdas;a1 a2 a3; def;433;3/4/08;dasd;bdbdbd;wings;b1 b2 b3 b4 b5; I usually like to use a hash table to hold the column names, so here's an example of how I'd handle named columns: `#!/usr/local/bin/perl use strict; use Data::Dumper; # Named column headings my @columnNames = qw{ name tel col3 col4 col5 col6 notes }; while (<DATA>) { chomp(); my @rows = split(/\s+/, $_, scalar(@columnNames)); my %headings; @headings{@columnNames} = @rows; print "Dump headings hash: ".Dumper(\%headings)."\n"; } __DATA__ abc 322 2/3/09 aaa aadda dasdas a1 a2 a3 def 433 3/4/08 dasd bdbdbd wings b1 b2 b3 b4 b5` [download] Output: Dump headings hash: $VAR1 = { 'col5' => 'aadda', 'col3' => '2/3/09', 'tel' => '322', 'notes' => 'a1 a2 a3', 'name' => 'abc', 'col4' => 'aaa', 'col6' => 'dasdas' }; Dump headings hash: $VAR1 = { 'col5' => 'bdbdbd', 'col3' => '3/4/08', 'tel' => '433', 'notes' => 'b1 b2 b3 b4 b5', 'name' => 'def', 'col4' => 'dasd', 'col6' => 'wings' };	[reply] [d/l] [select]