mav3067 has asked for the wisdom of the Perl Monks concerning the following question:

Hey Everyone, I wrote a script which was originally a one off thing, but now more people have expressed interest in using it and I would like to make it a little more robust. I would like to make my script handle all types of new line characters LF,CR, and CRLF so that the users can remain in their blissfully unaware state about such things ;) Here are the file loading subs:
sub load_parents { my ($parent_index) = @_; my %parents; open (FH, "< $parent_index") or die "could not open file $parent_i +ndex for reading:$!"; while (my $line = <FH>) { chomp($line); my ($index, $parent) = split(/\t/, $line); $parents{$index} = $parent; } return \%parents; } sub load_primers { my ($primer_index) = @_; my %primers; open (FH, "< $primer_index") or die "could not open file $primer_i +ndex for reading:$!"; while (my $line = <FH>) { chomp($line); my ($index, $primer) = split(/\t/, $line); $primers{$index} = $primer; } return \%primers; }
I woud appreciate any advice on what I could do to hanle the different new line characters, or if you see a blatent flaw in my logic don't be affraid to point that out either.
Thanks, Mav

Replies are listed 'Best First'.
Re: handling different new line characters
by Velaki (Chaplain) on Aug 28, 2006 at 15:09 UTC

    Instead of chomp($line), you could use a regex:

    $line =~ s/[\n\r]//g;

    This way you're handling both the CR and the LF.

    Hope that helped,
    -v.

    "Perl. There is no substitute."
Re: handling different new line characters
by swampyankee (Parson) on Aug 28, 2006 at 16:23 UTC

    If the program is to be used for files resident on a given platform – Windows, Macintosh, *ixen, etc. – chomp manages that transparently. If you need to deal with end-of-record markers over something like NFS, my technique would be to determine the EOR marker for the source machine and reset $/ appropriately for the machine on which the program is running.

    emc

    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Albert Einstein
Re: handling different new line characters
by McDarren (Abbot) on Aug 29, 2006 at 02:28 UTC
    "..if you see a blatent flaw in my logic.."

    Not a flaw in your logic per se, but more a comment on the way you are opening your files.
    It is generally accepted that the three argument form of open is a better choice. That is, instead of doing:

    open (FH, "< $parent_index") or die "could not open file $parent_inde +x for reading:$!";
    ..you do:
    open (FH, '<', $parent_index) or die "could not open file $parent_ind +ex for reading:$!";

    The main advantage of this is that it separates the mode from the filename, thereby avoiding any confusion between the two (ie. what if the filename begins with "<" ?). It also forces you to be explicit about which mode you want - again avoiding any possible confusion.

    And if you are using Perl 5.8.0+, you can even do:

    open (my $fh, '<', $parent_index) or die "could not open file $parent +_index for reading:$!";

    See the documentation for open for further reading, and this topic is also covered quite well in Intermediate Perl, if you happen to have a copy.

    Hope this helps,
    Darren :)

Re: handling different new line characters
by planetscape (Chancellor) on Aug 29, 2006 at 08:39 UTC

    A bit different approach, using an existing utility:

    In the quest for a utility to convert between *nix and DOS newlines on a regular basis, I have found flip: Newline conversion between Unix, Macintosh and MS-DOS ASCII files to be useful. It not only converts between DOS and Unix file formats, but also includes a handy -t command-line switch to tell you which format a given file uses.

    HTH,

    planetscape
Re: handling different new line characters
by jaa (Friar) on Aug 29, 2006 at 08:05 UTC
    No comment on the NL problem really, except that reading is usually not a problem in Perl - you have to be more careful if writing files for distribution to *nix or WinDOS. If you are looking at making your code more robust, I would second the previous recommendations on file handling - remember FH style globs are global.

    You might even consider using IO::File which IMHO has an easy peasy and pretty interface, and automatically closes in a kernel resource friendly manner!

    use IO::File; my $fh = IO::File->new($filename, "r") or die "unable to open $filename: $!"; for my $line ( $fh->getlines() ) { chomp $line; print "[$line]\n"; }
    I would also add the comment that your two subs duplicate each other - why not just have one? Or if you really like having the two names then...
    sub load_primers { return _loadIndex( @_); } sub load_parents { return _loadIndex( @_); }
    and move all the code into _loadIndex();

    $0.02

Re: handling different new line characters
by chromatic (Archbishop) on Aug 31, 2006 at 01:00 UTC