1. Localize *READ before opening a file on it.
  2. Do your own error processing like it says to in perlstyle. (eg ...or die "Cannot read $file: $!")
  3. Use the third -1 argument to split as documented in "perldoc -f split".
  4. I have been bitten moving between Linux and Windows by DOS endings. Instead of chomp I like to s/\r?\n$//. YMMV.
  5. Should two rows have the same first entry you are silently overwriting data. This can be a Bad Thing. Either put in an error check or use an array instead of a hash for the rows.
  6. If you are incrementing a variable, use a for() loop instead of while(). It may work the same, but the person maintaining it finds the looping construct more obvious.
  7. Switch the order in the hash. You said elsewhere you think about it one way. My experience says that that decision will come back to bite you. The only way that you will find yourself wanting to access that data structure is row by row. If that is not a good enough reason for you, then let me tell you that if you reverse it then you can later choose to extract out references to each row and access them directly. The double hash lookup is slower by a factor of 2 and forces you to write more code everywhere.
  8. Document how this function works. A short comment helps immensely.
  9. The idea behind this code will never support the full CSV spec or anything close to it. Document that limitation.

OK, let me put my wallet where my mouth is and show you a hasty rewrite that takes all of that into account:

=pod =item open_flatfile Takes the name of a pseudo-CSV flatfile and an optional delimiter as a +rgs. (The delimiter can be a regular expression created with qr//.) It ope +ns the file, uses the first line a a header, and returns the data as an array of hashes. This will not handle CSV files with escaped fields. =cut sub open_flatfile { my $file = shift; my $delim = shift || "\t"; local *FH; open (FH, "<$file") or die "Cannot read $file: $!"; my @contents = <FH>; close (FH); s/\r?\n$// foreach @contents; my @header = split ($delim, (shift @contents), -1); # Create an anonymous sub to do the work my $extract_row = sub { my @cols = split($delim, shift(@_), -1); my %row = map {($header[$_], $cols[$_])} 0..$#header; return \%row; }; return map {$extract_row->($_)} @contents; }
Cheers,
Ben

In reply to A few style suggestions by tilly
in thread Another flatfile thingy by BBQ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.