Process file regardless of the platform it was created on

andreas1234567 has asked for the wisdom of the Perl Monks concerning the following question:

Related nodes:

I primarily work in a linux server environment. I often encounter files that were originally written on another operating system. That can make life hard for scripts like the following:

$ echo -e "123\n456\n789" > foo.txt
$ hexdump -C foo.txt
00000000  31 32 33 0a 34 35 36 0a  37 38 39 0a              |123.456.7
+89.|
0000000c
$ cat foo.txt | perl -wle 'use strict; while(my $line = <>) { chomp $l
+ine; ($line =~ m/^\d+$/) ? print qq{number ok:"$line"} : print qq{num
+ber error:"$line"} }'
number ok:"123"
number ok:"456"
number ok:"789"
$ unix2dos foo.txt
unix2dos: converting file foo.txt to DOS format ...
$ hexdump -C foo.txt
00000000  31 32 33 0d 0a 34 35 36  0d 0a 37 38 39 0d 0a     |123..456.
+.789..|
0000000f
$ cat foo.txt | perl -wle 'use strict; while(my $line = <>) { chomp $l
+ine; ($line =~ m/^\d+$/) ? print qq{number ok:"$line"} : print qq{num
+ber error:"$line"} }'
"umber error:"123
"umber error:"456
"umber error:"789
$
[download]

I'd like my scripts to be platform-agnotic, that is, process files regardless of being created on a Unix, Windows or Mac platform.

I use unix2dos manually on occations, but I'd rather have my perl scripts do this themselves when they encounter files created on a foreign operating system. Is there a reliable way to do this?

--
No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]

Comment on Process file regardless of the platform it was created on Download Code

Replies are listed 'Best First'.
Re: Process file regardless of the platform it was created on by moritz (Cardinal) on May 20, 2008 at 07:57 UTC
If you know which kind of file it is you are processing you can apply the `:crlf` IO layer if necessary. If you don't know, I guess you'll have to resort to `s/\s+$//` instead of chomp. (If you don't want to remove all trailing whitespaces you have to tweak the regex, of course) You could write your own IO layer of course which turns any crlf and lf into `$/`.	[reply] [d/l] [select]
Re: Process file regardless of the platform it was created on by psini (Deacon) on May 20, 2008 at 08:02 UTC
`s/\n?\r/\r/g`? (This preserves empty lines, too) Rule One: Do not act incautiously when confronting a little bald wrinkly smiling man.	[reply] [d/l]
Re^2: Process file regardless of the platform it was created on by andreas1234567 (Vicar) on May 20, 2008 at 08:25 UTC
Yes, I guess a `s/\r\n/\n/g` followed by chomp would do it: $ cat unixordos.pl use strict; use warnings; my $fh = undef; open($fh, "<", "foo.txt") or die "failed to open 'foo.txt':$!"; while (my $line = <$fh>) { $line =~ s/\r\n/\n/g; chomp $line; ($line =~ m/^\d+$/) ? print qq{number ok:"$line"} : print qq{number error:"$line"}; print "\n"; } close($fh) or die "failed to close 'foo.txt':$!"; __END__ $ echo -e "123\n456\n789" > foo.txt $ perl unixordos.pl number ok:"123" number ok:"456" number ok:"789" $ unix2dos foo.txt unix2dos: converting file foo.txt to DOS format ... $ perl unixordos.pl number ok:"123" number ok:"456" number ok:"789" $ [download] Will this work generally? -- No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]	[reply] [d/l] [select]
Re^3: Process file regardless of the platform it was created on by almut (Canon) on May 20, 2008 at 08:36 UTC
I guess a s/\r\n/\n/g followed by chomp would do it That substitution is exactly what the `crlf` PerlIO layer is doing (on input). The layer is by default in effect when you're on Windows (which is why you don't have the problem with Windows' files on Windows :) In other words, you could also do `open($fh, "<:crlf", "foo.txt") or die "failed to open 'foo.txt':$!";` [download]	[reply] [d/l] [select]
Re^3: Process file regardless of the platform it was created on by psini (Deacon) on May 20, 2008 at 08:32 UTC
It should work AFAIK. It too converts single CR to LF (mac format?) Rule One: Do not act incautiously when confronting a little bald wrinkly smiling man.	[reply]
Re: Process file regardless of the platform it was created on by graff (Chancellor) on May 21, 2008 at 01:49 UTC
There are conditions on my current macosx where an app will save a "plain text" version of a file using just CR for line termination, even though the OS is really a flavor of unix, and perl's default $/ on macosx is LF. (I can only wonder how much longer this CR silliness will go on.) I know that CRLF comes from a variety of web apps as well as from ms-win systems. Alas, $/ has to be a literal string -- you can't use a regex as the input record separator. If you never encounter any really huge CR-format files, you might want something like this: `while (<>) { my @lines; if ( /\n$/ ) { tr/\r\n//d; @lines = ( $_ ); } else { @lines = split /\r/; } for my $line ( @lines ) { # do stuff with each line of text } }` [download] Either that or else you just use slurp mode in all cases, and split into lines (using `/[\r\n]+/`) if you really need to do that. If you have to worry about getting really huge files in any of the three possible formats, you'll want to diagnose each file first -- at least, check the file size first, and if it's really big, read just enough (e.g. `read FH, $_, 2048;`), to figure out what the line termination is, set $/ accordingly, then rewind it (`seek FH,0,0;`) to read the whole thing as it was intended to be read.	[reply] [d/l] [select]