luckycat has asked for the wisdom of the Perl Monks concerning the following question:

I've written a Perl script which runs in linux that copies an ascii text file to a new file, line terminators could be Windows style (\r\n) or Unix (\n). On certain lines which match a string I'm looking for I will process them before outputting that line back to the new file. On lines I don't process, doing a simple print OUTFILE $_; works great as it'll just replicate whatever line terminator the line uses and write that out to the output file. But for the lines I'm processing, I need to write back my processed line back out to it so I need to add in the line terminator manually. I'm doing this check right now:
my $endofline = ( /\r\n$/ ) ? "\r\n" : "\n";
Then here's the code for the processed line I'd write out:
print OUTFILE "$processed_string","$endofline";
My script works but I'm wondering if there's a better, cleaner way to do this? Currently I'm doing the end of line check within the while loop that processes each line of the input file so every single line is checked which is probably not efficient. I wanted to guard against the case where you could possibly have mixed windows and unix end of line terminators in the same file. However if that's extremely rare I guess I could remove the check from within the while loop that processes each line of the input file. If I do that, how would I get the type of line terminator the file uses so I know what to use in the print statement later? Basically is there a better way to do what I'm trying to do. Thanks for any tips.

Replies are listed 'Best First'.
Re: Copying an ascii text file, replicating end of line terminator (windows or unix)
by Eily (Monsignor) on Jul 24, 2015 at 08:52 UTC

    The output record separator may help you do what you want:

    $\ = $endofline; print OUTFILE $processed_string; # Appends the wanted end of line corr +ectly
    This may be a problem if you are not writing to OUTFILE only while $\ is changed. If you can limit the change to a defined scope (ie: localize) that would probably be better:
    { local $\ = $endofline; my $old_handle = select OUTFILE; # Write to OUTFILE only while (<>) { process($_); print $_; # write to selected handle (OUTFILE) and append $\ } select $old_handle; } # $\ has its old value again

Re: Copying an ascii text file, replicating end of line terminator (windows or unix)
by Laurent_R (Canon) on Jul 24, 2015 at 08:43 UTC
    Hi luckycat,

    you could read the first line of the file outside of the main loop (before entering the loop). Something like this:

    my $first_line = (<$IN>); my $endofline = $first_line =~ /\r\n$/ ? "\r\n" : "\n"; # process first line ... while (my $line = <$IN>) { # ... }
    It makes the program a bit more complicated, because you need to process the first line separately, but it might be worth it if your file is really large. If the file has a moderate size, I would probably not care about trying to be more efficient by checking only once the end-of-line characters.
Re: Copying an ascii text file, replicating end of line terminator (windows or unix)
by poj (Abbot) on Jul 24, 2015 at 12:30 UTC

    Another option would be split with capture brackets, like this

    #!perl use strict; # create test file open OUT,'>','line.txt' or die; binmode OUT; print OUT "Windows".chr(13).chr(10); print OUT "Unix".chr(10); print OUT "Windows".chr(13).chr(10); open OUT,'>','copy.txt' or die; binmode OUT; open IN,'<','line.txt' or die; binmode IN; while (<IN>){ my ($line,$eol) = split /([\015\012]+)/,$_; # process $line print OUT $line.$eol; }
    poj
Re: Copying an ascii text file, replicating end of line terminator (windows or unix)
by flexvault (Monsignor) on Jul 24, 2015 at 12:09 UTC

    Welcome luckycat,

      " ... print OUTFILE $_; works great..."
    You have to be careful with the "$_", since in your processing you may modify it. Other monks have shone you a better way:
    while (my $line = <$IN>) { # ... }
    However if the line terminator is set to "\n" you can't use your test since you have only "\r" or something else at the end of line. Update: I don't know what I was thinking :-(

    I think if you look at 'chop' it might help you with modifying the output.

    while (my $line = <$IN>) { my $eof = chop( $line ); # This removes last char and saves if ( $eof ne "\r" ) { $line .= $eof; } if ( $NoWork == 0 ) { print OUTFILE "$line\n"; } DoYourWork: # ... }
    Your could use 'substr' to test the end of line, but I don't know which would be faster in your environment. It may be a good time to try 'Benchmark'.

    Regards...Ed

    "Well done is better than well said." - Benjamin Franklin