fritzvtb has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to determine if a text file is a dos or unix text file. I tried doing this by using the m// regular expression but I can't get it work. I basically need a piece of code that will "die" if a dos text file (\r\n) is loaded into my perl script. I am new to perl and working my way through it. Any help would be appreciated. Thanks, fritzvtb
  • Comment on Determine whether file is dos or unix format

Replies are listed 'Best First'.
Re: Determine whether file is dos or unix format
by thundergnat (Deacon) on Nov 28, 2005 at 21:22 UTC

    Rather than just die()ing, you could auto convert the file to the native format on load. That's something I do often in my scripts where I could be recieving files in DOS, Unix or Mac format.

    open my $fh, '<', $filename or die "Can't open file. $!\n"; while (<$fh>){ $_ =~ s/\cM\cJ|\cM|\cJ/\n/g; # do somthing with line }

    Yeah, it does useless conversions on native format files, but the overhead is nearly negligible usually.

Re: Determine whether file is dos or unix format
by sgifford (Prior) on Nov 28, 2005 at 21:08 UTC
    While you're reading it in, just check to see if it contains any \r\n sequences:
    if (/\x0D\x0A/) { die "DOS files not allowed" }

    If you're reading it a line at a time, you can do this for the first line or for every line, depending on how picky you want to be. If you're reading the whole thing into memory, using that pattern across the whole file will work.

    Update: Use hex escapes instead of \r\n for better portability. Thanks ikegami and GrandFather.

      That won't work on the Mac. Use hex escapes or similar to be portable: /\x0D\x0A/
        Is that still true on OS X? I'd assumed that POSIX mandated \r and \n to be certain values.
        --
        James Antill
Re: Determine whether file is dos or unix format
by GrandFather (Saint) on Nov 28, 2005 at 21:09 UTC

    What wat the specific m// that you used? Did you use the /s switch? You may need to specify the hex values for the characters as m/\x0d\x0a/s.


    DWIM is Perl's answer to Gödel
      The "s" switch only affects what "." matches. It is therefore useless (albeit harmless) in your solution.
Re: Determine whether file is dos or unix format
by Anonymous Monk on Nov 28, 2005 at 21:17 UTC
    It's good that you're trying out Perl; congradulations! :-)

    Other people have suggested a regular expression you could use to detect whether a dos text file was loaded; the next step you could try is to change the DOS text file into a UNIX text file, by changing the "\r\n" into a "\n".

    It's often easier to write smarter programs than it is to get smarter users; so making your program handle DOS text files might be easier than teaching all our end users not to use DOS text files in the first place.

    Good Luck! :-)
    --
    Ytrew

Re: Determine whether file is dos or unix format
by adamk (Chaplain) on Nov 29, 2005 at 01:50 UTC
Re: Determine whether file is dos or unix format
by wazzuteke (Hermit) on Nov 28, 2005 at 21:12 UTC
    This simple piece of code should do the trick for you...
    open FILE, 'test_file.txt' || die "$!\n"; my $text = join( '', @{ [ <FILE> ] } ); close FILE; die "The file supplied is not of unix format!\n" if ( $text =~ /\r\n/ +);
    You might not want to take this snippet as literal, for it will load the entire contents of the file into memory and test the entite thing. This will be quite inneficient. However, this should give you the general idea of how to do what you are looking for.

    Update:
    I might as well give you an example of something that will be, possibly, a little less memory intensive:
    open FILE, 'test_file.txt' || die "$!\n"; while ( my $line = <FILE> ) { die "File is not of UNIX format!\n" if ( $line =~ /\r\n/ ); last; } close FILE;
    This one will only check the first line for Windows format. If it finds it, it will die; otherwise will simply move on with the rest of the application.

    Good Luck!

    ---hA||ta----
    print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );

      The snippet

      open FILE, 'test_file.txt' || die "$!\n"; my $text = join( '', @{ [ <FILE> ] } ); close FILE;
      1. will never die on error,
      2. is inefficient,
      3. uses a global variable (FILE),
      4. would be safer if the 3 parameter open was used, and
      5. will not work on a Windows (and other?) machines without binmode.


      It will never die because "||" has higher precedence than ",".

      open FILE, 'test_file.txt' || die "$!\n";
      means
      open FILE, ('test_file.txt' || die "$!\n");
      so use
      open FILE, 'test_file.txt' or die "$!\n";
      or
      open(FILE, 'test_file.txt') || die "$!\n";


      It is ineffecient you create an anonymous array and immediately dereference it.

      join( '', @{ [ <FILE> ] } );
      is equivalent to
      join( '', <FILE> );

      It is also ineffecient because join is slower and less memory efficient than undefining $/.

      my $text = join( '', <FILE> );
      is equivalent to the more efficient
      my $text; { local $/; $text = <FILE>; }


      So you end up with

      my $text; { open(my $fh, '<', 'test_file.txt') or die("Unable to open input file: $!\n"); binmode($fh); local $/; $text = <$fh>; }
        I second of all of what you wrote in this detailed and precise post. Only I feel like adding for completeness that re binmode, an alternative is given by layers/disciplines, and an IMHO clear one in terms of readability intelligibility. And I'm keen on do too, so all in all I'd rewrite the above like
        my $text = do { open my $fh, '<:raw', 'test_file.txt' or die "Unable to open input file: $!\n"; local $/; <$fh>; };
Re: Determine whether file is dos or unix format
by Anonymous Monk on Nov 28, 2005 at 21:52 UTC

    Hello all,

    I have tried the first solution posted, which was similar to what

    my ($input_root) = ($input_file =~ /([^\\\/]+)\.\w+$/); open(IN, "<$input_file") or die "Couldn't open file: $!: $input_f +ile"; while(<IN>) { if (/\r\n/) { die "DOS files not allowed" } $total_lines++; }

    When I run a DOS file through this code it does not "die" on me. See the problem is that I built a validation program that validates UNIX text files. Sometimes when people hand-edit the file they save it in a DOS format via UltraEdit. We don't find out that we sent a DOS file until our Vendor tries to ingest it, which is a pain in the butt.

    That is my situation and code. Any help would be appreciated. I spot check files by hand and do convert them to UNIX via perl, but I want really to return an error that stop validation so the user will fix the file to UNIX.

    Thanks,
    fritzvtb

    Edited by GrandFather to fix formatting

      Please use <c>...</c> around your code.

      On which OS will your run your code?

      • On the Mac, \r is LF and \n is CR, so don't use \r and \n.
      • On the Mac, the line is ended by CR, so checking for CRLF on a line basis won't work
      • On Windows, you need to use binmode. CRLF becomes LF when a file is read without binmode.
      • On unix, what you have should work.

      Fix:

      # Works on unix. # Works on Windows. # Still doesn't work on Mac. open(IN, '<', $input_file) or die "Couldn't open file $input_file: $!\n"; binmode(IN); while (<IN>) { if (/\x0D\x0A/) { die("DOS files not allowed\n"); } $total_lines++; }

      Update: Here's something that works everywhere:

      # Works on unix. # Works on Windows. # Works on Mac. { open(my $fh, '<', $input_file) or die "Couldn't open file $input_file: $!\n"; binmode($fh); my $buf = ''; while (read($fh, $buf, 1024, length($buf))) { if (/\x0D\x0A/) { die("DOS files not allowed\n"); } $buf = substr($buf, -1); } } { open(my $fh, '<', $input_file) or die "Couldn't open file $input_file: $!\n"; while (<$fh>) { $total_lines++; } }
Re: Determine whether file is dos or unix format
by kulls (Hermit) on Nov 29, 2005 at 03:51 UTC
    Hi,
    you can convert the DOS file to Unix file , b'fore you process the script, and here u got the output through perl.
    -kulls
      This is fritzvtb. Thanks everyone. I finally got it to work from all of your tips. I have passed the dos2unix perl script to my users also. Thanks again, Perl Monks is great! Fritzvtb
Re: Determine whether file is dos or unix format
by planetscape (Chancellor) on Nov 30, 2005 at 01:50 UTC

    While it appears your question has been adequately answered, I would still like to direct your attention to flip: Newline conversion between Unix, Macintosh and MS-DOS ASCII files, which not only converts between DOS and Unix file formats, but also includes a handy -t command-line switch to tell you which format a given file uses. I use it a lot.

    HTH,

    planetscape