finding different linebreaks with <>

seaver has asked for the wisdom of the Perl Monks concerning the following question:

Dear all

I've recently come across a file, i think it's from a mac, which, on my RH9 box, shows it's line breaks as:

'^M'.

As a result of which, when trying to read in the file, it does NOT read in any lines, or so I believe.

has anyone come across this before?

Cheers
Sam

UPDATE:

The code belwo is basically what I'm using, and there's NO output.

#!/usr/bin/perl -w
use strict;

my $file = $ARGV[0];

while(<$file>){
print $_;
}
[download]

the file looks a little like this, remember the linebreaks are '^M':

ATOM      5  OE2 GLU A   3
     116.164  67.506  12.993  1.00129.84   8^MATOM      6  C   GLU A  
+ 3     115.793  66.073  17.878  1.00101.37   6^MATOM      7  O   GLU 
+A   3     116.641
  65.434  18.500  1.00 95.58   8^MATOM      8  N   GLU A   3     117.2
+31  68.120  18.071  1.00102.98   7^MATOM      9  CA  GLU A   3     11
+5.864  67.588  17
.783  1.00103.63   6^MATOM     10  N   GLU A   4     114.768  65.506  
+17.255  1.00 97.10   7^MATOM     11  CA  GLU A   4     114.577  64.06
+7  17.260  1.00 9
1.36   6^MATOM     12  CB  GLU A   4     113.440  63.684  18.207  1.00
+ 99.04   6^MATOM     13  CG  GLU A   4     113.897  63.263  19.592  1
+.00118.55   6^MAT
OM     14  CD  GLU A   4     112.731  62.936  20.505  1.00127.23   6^M
+ATOM     15  OE1 GLU A   4     111.851  62.147  20.096  1.00129.74   
+8^MATOM     16  O
E2 GLU A   4     112.699  63.467  21.635  1.00130.22   8^MATOM     17 
+ C   GLU A   4     114.281  63.539  15.867  1.00 84.90   6^M
[download]

Comment on finding different linebreaks with <> Select or Download Code

Replies are listed 'Best First'.
Re: finding different linebreaks with <> by Zaxo (Archbishop) on Apr 08, 2004 at 18:54 UTC
The problem is not the line breaks - you're not opening the file. `{ open my $fh, '<', $file or die $!; while (<$fh>) { print; } }` [download] You may need to also set `$/ = "\r"` but that seems like an odd record separator to me. After Compline, Zaxo	[reply] [d/l]
Re: Re: finding different linebreaks with <> by hardburn (Abbot) on Apr 08, 2004 at 18:58 UTC
You may need to also set $/ = "\r" but that seems like an odd record separator to me. Don't let Steve Jobs hear you. ---- `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l]
Re: finding different linebreaks with <> by hardburn (Abbot) on Apr 08, 2004 at 18:57 UTC
Unix (and any similar system) thinks lines end with \n. Apple thinks they end with \r. DOS couldn't decide, so it went with \r\n. Under Unix-like systems, the \r often shows up as '^M'. Perl will change what it thinks the line ending is based on what system you're running on. You can override this by setting $/ (input record seperator) to whatever you want the newline to be (even to something that isn't a newline at all, like 'FOOBAR'). Just set `local $/ = "\r";` before you read the data in and it should work. There are also conversion programs around that will change the line endings used on one system to another (look around for 'dos2unix', 'unix2mac', etc.) ---- `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: finding different linebreaks with <> by Anonymous Monk on Apr 09, 2004 at 04:13 UTC
Under Unix-like systems, the \r often shows up as '^M' Correction, under Unix-like text editors like vi, emacs, pico, nano ... although most can be configured to not care what the "line endings" are	[reply]
Re: Re: Re: finding different linebreaks with <> by hardburn (Abbot) on Apr 09, 2004 at 13:04 UTC
That's why I said "often shows up as", not "always show sup as". ---- `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l]
Re: finding different linebreaks with <> by matija (Priest) on Apr 08, 2004 at 19:01 UTC
Yes, that's the Mac method of breaking lines - you get that one you transfer their text file in binary mode. If you just want to read the file, you can do: `local $/="\cM";` and read it like normal. Or you could slurp it in: `$wholefile=join("",<FILE>); $wholefile=~tr/\cM/\n/; print CONVERTED $wholefile;` [download] which will give you the file transformed (but don't do slurp for large files like logs).	[reply] [d/l] [select]
Re: Re: finding different linebreaks with <> by revdiablo (Prior) on Apr 08, 2004 at 20:00 UTC
I would think about either using the more efficient idiomatic slurp (i.e. `my $wholefile = do { local (@ARGV, $/) = $filename; <> };`) or using the excellent and even more efficient File::Slurp module.	[reply] [d/l]
Re: Re: finding different linebreaks with <> by seaver (Pilgrim) on Apr 08, 2004 at 20:02 UTC
OK, cool thank you for your prompt replies. So, i finally get the file to output, line by line, with this code: `#!/usr/bin/perl - w use strict; local $/="\cM"; my $file = $ARGV[0]; open my $fh, '<', $file or die $!; while(<$fh>){ print $_."\n"; }` [download] I had to add the "\n" to the end of each line so that it printed out correctly in the unix system, the reason given by you guys However, what if my file could be EITHER *nix/mac or windows, what's the best way to tell?? I'm hoping you can use a regular expression with the '$/' and do something like this: `local $/ = "[\n\r\cM]";` [download] which then gets regexp for any or either, does that make sense? Cheers Sam	[reply] [d/l] [select]
Re: Re: Re: finding different linebreaks with <> by hardburn (Abbot) on Apr 08, 2004 at 20:20 UTC
Sorry, no regexen are allowed in `$/`. The best solution is to convert the data file into a consistant form before the main program gets it. ---- `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: Re: Re: finding different linebreaks with <> by Cody Pendant (Prior) on Apr 09, 2004 at 00:55 UTC
Re: Re: Re: Re: finding different linebreaks with <> by seaver (Pilgrim) on Apr 08, 2004 at 20:38 UTC
Re: Re: Re: finding different linebreaks with <> by tilly (Archbishop) on Apr 09, 2004 at 01:20 UTC
You can do exactly what you asked for with File::Stream. A moderately portable regular expression to use is `qr/\r\n?\|\n\r?/`. That should handle Unix, DOS and MacOS line endings on any of those three platforms.	[reply] [d/l]
Re: Re: Re: Re: finding different linebreaks with <> by seaver (Pilgrim) on Apr 09, 2004 at 16:39 UTC
Re: Re: Re: Re: Re: finding different linebreaks with <> by tilly (Archbishop) on Apr 09, 2004 at 18:15 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Re: finding different linebreaks with <> by Anomynous Monk (Scribe) on Apr 08, 2004 at 23:07 UTC
Nope, no regexes for $/. If the file isn't too large or performance isn't too critical, you could do something like (untested, and requires 5.8.0): `open my $fh, "filename" or die $!; while (my $line = <$fh>) { local $/ = "\r"; open my $fh, "<", \$line or die $!; while (my $line = <$fh>) { # process $line here } }` [download] Otherwise, open the file and read a bunch of characters (e.g. by setting $/=\1024; see perldoc perlvar) and check for \n or \r (e.g. with `$/ = $1 if $buffer=~/([\r\n])/;`)and set $/ appropriately; then seek to the beginning of the file and start the read loop.	[reply] [d/l] [select]