Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

A few style suggestions

by tilly (Archbishop)
on Aug 04, 2000 at 19:44 UTC ( [id://26214]=note: print w/replies, xml ) Need Help??


in reply to Another flatfile thingy

  1. Localize *READ before opening a file on it.
  2. Do your own error processing like it says to in perlstyle. (eg ...or die "Cannot read $file: $!")
  3. Use the third -1 argument to split as documented in "perldoc -f split".
  4. I have been bitten moving between Linux and Windows by DOS endings. Instead of chomp I like to s/\r?\n$//. YMMV.
  5. Should two rows have the same first entry you are silently overwriting data. This can be a Bad Thing. Either put in an error check or use an array instead of a hash for the rows.
  6. If you are incrementing a variable, use a for() loop instead of while(). It may work the same, but the person maintaining it finds the looping construct more obvious.
  7. Switch the order in the hash. You said elsewhere you think about it one way. My experience says that that decision will come back to bite you. The only way that you will find yourself wanting to access that data structure is row by row. If that is not a good enough reason for you, then let me tell you that if you reverse it then you can later choose to extract out references to each row and access them directly. The double hash lookup is slower by a factor of 2 and forces you to write more code everywhere.
  8. Document how this function works. A short comment helps immensely.
  9. The idea behind this code will never support the full CSV spec or anything close to it. Document that limitation.

OK, let me put my wallet where my mouth is and show you a hasty rewrite that takes all of that into account:

=pod =item open_flatfile Takes the name of a pseudo-CSV flatfile and an optional delimiter as a +rgs. (The delimiter can be a regular expression created with qr//.) It ope +ns the file, uses the first line a a header, and returns the data as an array of hashes. This will not handle CSV files with escaped fields. =cut sub open_flatfile { my $file = shift; my $delim = shift || "\t"; local *FH; open (FH, "<$file") or die "Cannot read $file: $!"; my @contents = <FH>; close (FH); s/\r?\n$// foreach @contents; my @header = split ($delim, (shift @contents), -1); # Create an anonymous sub to do the work my $extract_row = sub { my @cols = split($delim, shift(@_), -1); my %row = map {($header[$_], $cols[$_])} 0..$#header; return \%row; }; return map {$extract_row->($_)} @contents; }
Cheers,
Ben

Replies are listed 'Best First'.
RE: A few style suggestions
by chip (Curate) on Aug 11, 2000 at 05:53 UTC
    WRT s/\r?\n$// ...

    It may seem like a minor point, but I'd rather not depend on the values of \r and \n any more than I have to. And I really don't like using $ in patterns when I'm actually thinking about newlines (what with it matching before a trailing newline, and all). My version of the above, if I'm in full-blown portability mode:

    s/[\xD\xA]+\z//;
    (Macs use single CRs as line endings....)

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      You are right of course. In my defence I am rarely in full-blown portability mode. But I am frequently faced with writing code that will run under both windows and Linux, on essentially the same files, shared under SAMBA. Back when I used chomp I got seriously bitten, thought I had lost a ton of data somewhere. (I just wasn't seeing it because of the \r.)

      I believe in anticipating future problems, but there is a limit to how much I worry about in my own narrow set of problems. :-)

(bbq) RE: A few style suggestions
by BBQ (Curate) on Aug 04, 2000 at 20:57 UTC
    Use the third -1 argument to split as documented in "perldoc -f split".

    Very well noted!! Just so that lazy people know what we are talking about, here's what perldoc has to say about the third argument of split (aka LIMIT)

    If LIMIT is specified and positive, splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

    Switch the order in the hash. You said elsewhere you think about it one way. My experience says that that decision will come back to bite you. (...)

    Now I understand what brtrott was saying. I am looking at the table from a user's perspective as opposed to a programer's perpective. I can see this is good advice now, and not just a matter of preference!

    The idea behind this code will never support the full CSV spec or anything close to it. Document that limitation.

    Humm.. You probably didn't read the description of Open Flat File. I haven't used CSVs for quite some time now. I just pulled this one up as an example for peer-teaching. No formal documentation required, since it will be tossed into a tar pit as soon as the brain-storm is over. :)

    Nevertheless, all of your recommendations make perfect sense. Thanks a bunch for the contribution.

    #!/home/bbq/bin/perl
    # Trust no1!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://26214]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-24 17:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found