narashima has asked for the wisdom of the Perl Monks concerning the following question:

Revered Monks,

I often end up reading pipe delimited files and create a hash of arrays from there. Below is what I do typically. What I am interested in knowing is if this is indeed the most optimal way of doing it?
while(<SRC>) { $Record = $_; @row = split(/\|/, $Record); $prod{$row[0]}=[$row[1],$row[2],$row[3]]; }
Any advise is greatly appreciated.

thanks,
narashima.

Replies are listed 'Best First'.
Re: Optimal way to read in pipe delimited files
by Fletch (Bishop) on Nov 09, 2005 at 19:27 UTC

    Short of moving to an XS based parser, stop making redundant copies of your data ($Record, and @row into a fresh arrayref).

    while( <SRC> ) { chomp; my( $key, @row ) = split( /\|/, $_ ); $prod{ $key } = \@row; }
      you can take this two steps further by getting rid of $key and using a map:
      %prod=map{chomp;@a=split/\|/;shift@a,[@a]}<SRC>;
      or, expanded,
      %prod = map { chomp; @a = split /\|/; shift(@a) => [@a] } <SRC>
      You have to do [@a] instead of \@a if you do not localize the array with "my".
        I think that's one step too far. Specifically, getting rid of a descriptive scalar in favor of lumping all the values together and using shift. I don't see any advantage in performance or clarity.

        Caution: Contents may have been coded under pressure.

        Could someone please explain the significance of shift@a,[@a] in the above map? I tried removing this statement and it did not work. I really dont know what that statement is doing.
        Also I did not understand why [@a] is to be used if my is not used.
        thanks
        narashima
      Thanks Fletch. But I thought there would something more perlish......
Re: Optimal way to read in pipe delimited files
by dragonchild (Archbishop) on Nov 09, 2005 at 19:51 UTC
    use Text::xSV; my $parser = Text::xSV->new( fh => \*SRC, sep => '|', ); my %prod; while ( my @row = $parser->get_row ) { $prod{ shift @row } = \@row; }

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Optimal way to read in pipe delimited files
by traveler (Parson) on Nov 09, 2005 at 19:43 UTC
    I like using Text::CSV_XS. It allows setting the separator to the pipe (or whatever). It does allow for quoted fields (a|b|"pipes use |"|c) and so forth. It may not be necessary for very simple files, but does use XS so it may be faster. That kind of depends on the data, so try it on yours.

    --traveler

      I checked with my data. It did not make a big difference as my data is mostly a huge matrix of decimals.
      Thanks,
      narashima
Re: Optimal way to read in pipe delimited files
by radiantmatrix (Parson) on Nov 09, 2005 at 20:36 UTC

    I don't know about 'optimal', really. The code you list implies that you can rely on having 4-element rows, so this might be some fun:

    while (<SRC>) { $prod{$1} = [$2,$3,$4] if m{(.+?)\|(.+?)\|(.+?)\|(.+)}; }

    Or, another more general approach with regex captures, which works with any data size:

    while (<SRC>) { @{ $prod{$1} } = split('\|', $2) if m/^(.+?)\|(.+)/; }

    Of course, I doubt those are very fast, compared to, say:

    while (<SRC>) { my @row = split(/\|/, $_); $prod{$row[0]} = \@row[1..$#row]; #using a slice; }
    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
Re: Optimal way to read in pipe delimited files
by Aristotle (Chancellor) on Nov 10, 2005 at 04:10 UTC

    Just to reinforce dragonchild’s response – use Text::xSV. Rolling your own is not recommended.

    Makeshifts last the longest.