Re: Dealing with files with differing line endings
by stevieb (Canon) on Nov 05, 2021 at 21:17 UTC
|
My File::Edit::Portable was written to deal with this exact situation.
Get a file handle of the file with the record separators changed to that of the local platform, make changes, and write it back to the same file with the original record separators:
use File::Edit::Portable;
my $rw = File::Edit::Portable->new;
my $fh = $rw->read('file.txt');
...
$rw->write(contents => $fh);
Get an array of a file's contents with the line endings stripped off (one line per element), make changes, and write the data back to the original file (the original line endings will be preserved and put back into place automagically):
my @contents = $rw->read('file.txt');
for (@contents) {
...
}
$rw->write(contents => \@contents);
There's a myriad of other magic you can do as well, like automatically making a backup copy of each file, chaging line endings, using custom line endings, checking what endings a file is using, splicing stuff into the files etc. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Was guessing I wasn't the first person to have something like this problem! Thanks for pointing out your module.
| [reply] [Watch: Dir/Any] |
Re: Dealing with files with differing line endings
by ikegami (Patriarch) on Nov 05, 2021 at 20:07 UTC
|
All the systems you mentioned use CR LF or LF (unless you meant the ancient MacOS which used CR).
So just use LF as the line terminator as usual, but use something like s/\s+\z// instead of chomp.
while (<>) {
s/\s+\z//;
...
}
Alternatively, you could add a :crlf layer to the handle.
open(my $fh, '<:crlf', $qfn)
or die("Can't open \"$qfn\": $!\n");
while (<$fh>) {
chomp;
...
}
This already happens by default on Windows, which is why it can handle the listed file formats naturally. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Good point! I was jumping back to a more general question than I need to solve. As you say, I can just force LF for line boundaries. Parsing the contents can handle various line separators with \R, I think it already does (or I could do my own chomp with suitable regexp to kill all kinds of line terminators).
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Dealing with files with differing line endings
by LanX (Saint) on Nov 05, 2021 at 20:03 UTC
|
I always thought chomp handles that.
Could you provide us with an example which goes wrong?
Possible solutions, (if needed)
- replace chomp with a regex in your code
- override chomp with your own version in legacy code.
edit
Could it be you are not using chomp at all, but setting $/ to get rid of the line-endings?
| [reply] [Watch: Dir/Any] |
|
Here an example to overide chomp
use strict;
use warnings;
package NewChomp;
use Data::Dump qw/pp dd/;
use subs qw/chomp/;
sub chomp {
$_[0] =~ s/\n$//; # adjust here
}
pp my $line ="abcd\n";
chomp $line;
pp $line;
Just export it from a new module into your scripts, and adjust the regex to your needs.
| [reply] [Watch: Dir/Any] [d/l] |
|
Chomp cleans off the end of a string based on the current value of $/. I need something to cause reading the next line of the file to terminate in the correct place. (And then I probably do also need to do something like chomp, but that's easy.)
| [reply] [Watch: Dir/Any] |
Re: Dealing with files with differing line endings
by BillKSmith (Monsignor) on Nov 06, 2021 at 03:01 UTC
|
A general solution is impossible. Any file can contain normal text characters that another OS would interpret as line separators. You may be able to assume that this will never happen with your data. Your idea of slurping the entire file (in binmode) into a string is probably the safest. Use anything you know about the file (line length, number of lines, words that only occur at the start or end of a line, etc) to determine which kind of file it is. Open the string as a memory file with the appropriate IO layer. You could then use the <> operator exactly as you normally would.
| [reply] [Watch: Dir/Any] |
|
We may be overthinking this. ikegami's solution should be fine. The exception is that ancient Mac which uses <CR> instead of <CR><LF> or <LF> for line endings. One of my users was using an old Mac to edit one of my config files and reported that my config file "didn't work". I talked with this guy and told him to set his text editor to "write DOS compatible files" and that ended the problem. Modern Macs use <LF>. Unless there is a specific strange requirement, writing code to handle ancient Mac is not worth the effort.
| [reply] [Watch: Dir/Any] |
|
As a practical matter, I am sure that you are right. However, it is important to know that there are corner cases. Consider the following contrived example.
use strict;
use warnings;
use Test::More tests=>1;
my $file = \do{
"This \n is not the end of a line on windows\r\n"
};
open my $fh1, '<:raw', $file;
my $chars_read = length(<$fh1>);
close $fh1;
my $chars_expected=47;
is( $chars_read, $chars_expected, 'record length' );
OUTPUT:
1..1
not ok 1 - record length
# Failed test 'record length'
# at nl.pl line 15.
# got: '6'
# expected: '47'
# Looks like you failed 1 test of 1.
Unfortunately, my solution (use :crlf instead of :raw) does not work either.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
Re: Dealing with files with differing line endings
by Anonymous Monk on Nov 06, 2021 at 14:34 UTC
|
PerlIO::eol has not been updated in a while, but the last time I tried it still worked, and it installs successfully under Perl 5.34.0.
| [reply] [Watch: Dir/Any] |