paul-s- has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Brothers, I use a code repository that works on UNIX, so that the code may be accessed in Windows. We want Windows line-endings, most of the files are, but some have developed UNIX line-endings (which we don't want). We don't want to change all of the files - only the ones that are incorrect. So, I want a simple algorithm to run on Windows, that will read the first line of a file and tell me what the line-ending is. The following code seems to report UNIX line-endings in all cases - even when they are Windows. Please help.
my $cr = chr(0x0d); my $lf = chr(0x0a); my $infile = "test.txt"; open (IN, "<$infile") or die "Couldn't open input file: $!"; my $first_line = <IN>; my ($line_text, $line_ending) = ($first_line =~ /(.*)(\015|\012|\015\0 +12)/); # detect line ending type if ($line_ending eq "$cr$lf") { print "Windows line ending\n"; } elsif ($line_ending eq "$lf") { print "UNIX line ending\n"; } elsif ($line_ending eq "$cr") { print "Mac line ending\n"; } else { print "Unknown line ending\n"; } close (IN);

Replies are listed 'Best First'.
Re: How to Detect Mixed File Line-Endings
by Corion (Patriarch) on Jan 17, 2014 at 12:28 UTC

    Your regular expression does not check whether the \015 or the \012 or the \015\012 is really the last thing on the line.

    This can be fixed by by anchoring your regular expression to the end of the string:

    /(\015|\012|\015\012)\z/
      In the interest of a longer match first, I think OP should change the regex to test for CRLF first followed by either of CR or LF.

      m!\015\012|\012|\015!
Re: How to Detect Mixed File Line-Endings
by kcott (Archbishop) on Jan 17, 2014 at 20:49 UTC

    G'day paul-s-,

    Welcome to the monastery.

    This solution, which requires Perl 5.10.0, will skip your Windows files and allow you to process the Unix files.

    Given these files:

    $ cat -vet pm_1070943_CRLF.txt text^M$ $ cat -vet pm_1070943_LF.txt text$

    [Note: "cat -vet" shows special characters in files: '^M' = carriage return; '$' = newline]

    This script:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; use autodie; for my $file (qw{pm_1070943_CRLF.txt pm_1070943_LF.txt}) { open my $fh, '<', $file; my $first_line = <$fh>; $first_line =~ /(\R)\z/; if ($1 eq "\x0D\x0A") { say "Skipping: $file"; } else { say "Processing: $file"; } }

    [Note: '\R' is described in "perlrebackslash: Misc"]

    Produces this output:

    Skipping: pm_1070943_CRLF.txt Processing: pm_1070943_LF.txt

    -- Ken

Re: How to Detect Mixed File Line-Endings
by Anonymous Monk on Jan 17, 2014 at 13:34 UTC
    You need to amend your regular expression with a negative look-behind in the Unix case, so that it is not accidentally matched. Or just not use a regular expression at all. (substr is a better hammer for this kind of nail.)