Optimal way to read in pipe delimited files

narashima has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimal way to read in pipe delimited files by Fletch (Bishop) on Nov 09, 2005 at 19:27 UTC
Short of moving to an XS based parser, stop making redundant copies of your data (`$Record`, and `@row` into a fresh arrayref). `while( <SRC> ) { chomp; my( $key, @row ) = split( /\\|/, $_ ); $prod{ $key } = \@row; }` [download]	[reply] [d/l] [select]
Re^2: Optimal way to read in pipe delimited files by bageler (Hermit) on Nov 09, 2005 at 19:49 UTC
you can take this two steps further by getting rid of $key and using a map: `%prod=map{chomp;@a=split/\\|/;shift@a,[@a]}<SRC>;` [download] or, expanded, `%prod = map { chomp; @a = split /\\|/; shift(@a) => [@a] } <SRC>` [download] You have to do [@a] instead of \@a if you do not localize the array with "my".	[reply] [d/l] [select]
Re^3: Optimal way to read in pipe delimited files by Roy Johnson (Monsignor) on Nov 09, 2005 at 22:29 UTC
I think that's one step too far. Specifically, getting rid of a descriptive scalar in favor of lumping all the values together and using `shift`. I don't see any advantage in performance or clarity. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^3: Optimal way to read in pipe delimited files by narashima (Beadle) on Nov 09, 2005 at 22:34 UTC
Could someone please explain the significance of `shift@a,[@a]` in the above map? I tried removing this statement and it did not work. I really dont know what that statement is doing. Also I did not understand why `[@a]` is to be used if my is not used. thanks narashima	[reply] [d/l] [select]
Re^4: Optimal way to read in pipe delimited files by duff (Parson) on Nov 09, 2005 at 22:51 UTC
Re^2: Optimal way to read in pipe delimited files by narashima (Beadle) on Nov 09, 2005 at 19:46 UTC
Thanks Fletch. But I thought there would something more perlish......	[reply]
Re: Optimal way to read in pipe delimited files by dragonchild (Archbishop) on Nov 09, 2005 at 19:51 UTC
`use Text::xSV; my $parser = Text::xSV->new( fh => \*SRC, sep => '\|', ); my %prod; while ( my @row = $parser->get_row ) { $prod{ shift @row } = \@row; }` [download] My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l]
Re: Optimal way to read in pipe delimited files by traveler (Parson) on Nov 09, 2005 at 19:43 UTC
I like using Text::CSV_XS. It allows setting the separator to the pipe (or whatever). It does allow for quoted fields (`a\|b\|"pipes use \|"\|c`) and so forth. It may not be necessary for very simple files, but does use XS so it may be faster. That kind of depends on the data, so try it on yours. --traveler	[reply] [d/l]
Re^2: Optimal way to read in pipe delimited files by narashima (Beadle) on Nov 09, 2005 at 20:23 UTC
I checked with my data. It did not make a big difference as my data is mostly a huge matrix of decimals. Thanks, narashima	[reply]
Re: Optimal way to read in pipe delimited files by radiantmatrix (Parson) on Nov 09, 2005 at 20:36 UTC
I don't know about 'optimal', really. The code you list implies that you can rely on having 4-element rows, so this might be some fun: `while (<SRC>) { $prod{$1} = [$2,$3,$4] if m{(.+?)\\|(.+?)\\|(.+?)\\|(.+)}; }` [download] Or, another more general approach with regex captures, which works with any data size: `while (<SRC>) { @{ $prod{$1} } = split('\\|', $2) if m/^(.+?)\\|(.+)/; }` [download] Of course, I doubt those are very fast, compared to, say: `while (<SRC>) { my @row = split(/\\|/, $_); $prod{$row[0]} = \@row[1..$#row]; #using a slice; }` [download] <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code "In any sufficiently large group of people, most are idiots" - Kaa's Law	[reply] [d/l] [select]
Re: Optimal way to read in pipe delimited files by Aristotle (Chancellor) on Nov 10, 2005 at 04:10 UTC
Just to reinforce dragonchild’s response – use Text::xSV. Rolling your own is not recommended. Makeshifts last the longest.	[reply]