seaver has asked for the wisdom of the Perl Monks concerning the following question:

Dear all

I've recently come across a file, i think it's from a mac, which, on my RH9 box, shows it's line breaks as:

'^M'.

As a result of which, when trying to read in the file, it does NOT read in any lines, or so I believe.

has anyone come across this before?

Cheers
Sam

UPDATE:

The code belwo is basically what I'm using, and there's NO output.

#!/usr/bin/perl -w use strict; my $file = $ARGV[0]; while(<$file>){ print $_; }

the file looks a little like this, remember the linebreaks are '^M':

ATOM 5 OE2 GLU A 3 116.164 67.506 12.993 1.00129.84 8^MATOM 6 C GLU A + 3 115.793 66.073 17.878 1.00101.37 6^MATOM 7 O GLU +A 3 116.641 65.434 18.500 1.00 95.58 8^MATOM 8 N GLU A 3 117.2 +31 68.120 18.071 1.00102.98 7^MATOM 9 CA GLU A 3 11 +5.864 67.588 17 .783 1.00103.63 6^MATOM 10 N GLU A 4 114.768 65.506 +17.255 1.00 97.10 7^MATOM 11 CA GLU A 4 114.577 64.06 +7 17.260 1.00 9 1.36 6^MATOM 12 CB GLU A 4 113.440 63.684 18.207 1.00 + 99.04 6^MATOM 13 CG GLU A 4 113.897 63.263 19.592 1 +.00118.55 6^MAT OM 14 CD GLU A 4 112.731 62.936 20.505 1.00127.23 6^M +ATOM 15 OE1 GLU A 4 111.851 62.147 20.096 1.00129.74 +8^MATOM 16 O E2 GLU A 4 112.699 63.467 21.635 1.00130.22 8^MATOM 17 + C GLU A 4 114.281 63.539 15.867 1.00 84.90 6^M

Replies are listed 'Best First'.
Re: finding different linebreaks with <>
by Zaxo (Archbishop) on Apr 08, 2004 at 18:54 UTC

    The problem is not the line breaks - you're not opening the file.

    { open my $fh, '<', $file or die $!; while (<$fh>) { print; } }
    You may need to also set $/ = "\r" but that seems like an odd record separator to me.

    After Compline,
    Zaxo

      You may need to also set $/ = "\r" but that seems like an odd record separator to me.

      Don't let Steve Jobs hear you.

      ----
      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

Re: finding different linebreaks with <>
by hardburn (Abbot) on Apr 08, 2004 at 18:57 UTC

    Unix (and any similar system) thinks lines end with \n. Apple thinks they end with \r. DOS couldn't decide, so it went with \r\n. Under Unix-like systems, the \r often shows up as '^M'.

    Perl will change what it thinks the line ending is based on what system you're running on. You can override this by setting $/ (input record seperator) to whatever you want the newline to be (even to something that isn't a newline at all, like 'FOOBAR'). Just set local $/ = "\r"; before you read the data in and it should work.

    There are also conversion programs around that will change the line endings used on one system to another (look around for 'dos2unix', 'unix2mac', etc.)

    ----
    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      Under Unix-like systems, the \r often shows up as '^M'
      Correction, under Unix-like text editors like vi, emacs, pico, nano ... although most can be configured to not care what the "line endings" are

        That's why I said "often shows up as", not "always show sup as".

        ----
        : () { :|:& };:

        Note: All code is untested, unless otherwise stated

Re: finding different linebreaks with <>
by matija (Priest) on Apr 08, 2004 at 19:01 UTC
    Yes, that's the Mac method of breaking lines - you get that one you transfer their text file in binary mode. If you just want to read the file, you can do: local $/="\cM"; and read it like normal. Or you could slurp it in:
    $wholefile=join("",<FILE>); $wholefile=~tr/\cM/\n/; print CONVERTED $wholefile;
    which will give you the file transformed (but don't do slurp for large files like logs).

      I would think about either using the more efficient idiomatic slurp (i.e. my $wholefile = do { local (@ARGV, $/) = $filename; <> };) or using the excellent and even more efficient File::Slurp module.

      OK, cool thank you for your prompt replies.
      So, i finally get the file to output, line by line, with this code:

      #!/usr/bin/perl - w use strict; local $/="\cM"; my $file = $ARGV[0]; open my $fh, '<', $file or die $!; while(<$fh>){ print $_."\n"; }

      I had to add the "\n" to the end of each line so that it printed out correctly in the unix system, the reason given by you guys

      However, what if my file could be EITHER *nix/mac or windows, what's the best way to tell??

      I'm hoping you can use a regular expression with the '$/' and do something like this:

      local $/ = "[\n\r\cM]";

      which then gets regexp for any or either, does that make sense?

      Cheers
      Sam

        Sorry, no regexen are allowed in $/.

        The best solution is to convert the data file into a consistant form before the main program gets it.

        ----
        : () { :|:& };:

        Note: All code is untested, unless otherwise stated

        You can do exactly what you asked for with File::Stream. A moderately portable regular expression to use is qr/\r\n?|\n\r?/. That should handle Unix, DOS and MacOS line endings on any of those three platforms.
        Nope, no regexes for $/. If the file isn't too large or performance isn't too critical, you could do something like (untested, and requires 5.8.0):
        open my $fh, "filename" or die $!; while (my $line = <$fh>) { local $/ = "\r"; open my $fh, "<", \$line or die $!; while (my $line = <$fh>) { # process $line here } }
        Otherwise, open the file and read a bunch of characters (e.g. by setting $/=\1024; see perldoc perlvar) and check for \n or \r (e.g. with $/ = $1 if $buffer=~/([\r\n])/;)and set $/ appropriately; then seek to the beginning of the file and start the read loop.