daemonchild has asked for the wisdom of the Perl Monks concerning the following question:

hey all

I have written a script that parses data from a delimited text file. The script works fine, except doesn't deal well with EOL characters from different OS's (i'm stuck in win32 for the purposes of this script).

Anybody have a code snippet that will detect what the EOL char of the file is, and set $/ accordingly? I've tried various regexps using things like

if ($_ =~ /\r\n/) { $os = "win"; $/ = "\r\n"; } if ($_ =~ /[^\r]/ && $_ =~ /\n/) { $os = "unix"; $/ = "\n"; } if ($_ =~ /[^\n]/ && $_ =~ /\r/) {$os = "mac"; $/ = "\r"; }

but they don't seem to work for me. Do those look right? Does my problem lie elsewhere in the code? Is there a better way, that doesn't involve checking the EOL every iteration of the loop? (the above code is below the

while(<FILE>) {
line ). Any help would be appreciated.

--steve

Replies are listed 'Best First'.
Re: mac win and unix EOL chars
by nardo (Friar) on Jun 26, 2000 at 09:40 UTC
    if ($_ =~ /[^\r]/ && $_ =~ /\n/) { $os = "unix"; $/ = "\n"; }
    the /[^\r]/ will match any string which has a non \r character so the string "\r\n" will match because the \n is not a \r. what you want is:

    $_ !~ /\r/ (or, to save a few keystrokes: !/\r/)

    the same thing goes for the /[^\n]/ on the next line
Re: mac win and unix EOL chars
by mdillon (Priest) on Jun 26, 2000 at 12:55 UTC
    first of all, how can you be checking for the line ending of each line in a loop that relies on the value of that line ending to break input up line-by-line? if you want to do this, you can't use <> until you've determined what the value of $/. my advice is don't use it. just read the data with read and break up the lines manually with split.

    there is probably already a module that does the following, but this code reads in a file in chunks of a specified number of bytes and splits the input into an array containing lines and line endings in the order they are found in the file. so, passing the array returned to print should print out the exact, original file contents.

    $CHUNK_SIZE = 4096; open FILE, $file or die "$!\r\n"; while (read FILE, $chunk, $CHUNK_SIZE) { # split the chunk into a list of parts, keeping # the line endings in the array @parts = split /(\r\n?|\n)/, $chunk; if (defined $partial) { $part[0] = $partial . $part[0]; undef $partial; } # if the last part is not a line ending, then # the line could potentially be continued in # the following chunk if ($parts[-1] !~ /^\r\n?|\n$/) { $partial = pop @parts; } push @lines_and_endings, @parts; } push @lines_and_endings, $partial if defined $partial; close FILE; print @lines_and_endings; @just_lines = grep { ! /^\r\n?|\n$/ } @lines_and_endings;
Re: mac win and unix EOL chars
by athomason (Curate) on Jun 26, 2000 at 12:24 UTC
    Do you really need to know what OS the file came from, or are you just looking to get the same input from a file saved on different platforms? I've never needed the former, but solutions for the latter problem are given in a number of places around PM; check out this Q&A, and this code snippet which may be relevant.