Unix \n vs. DOS \n

greenhorn has asked for the wisdom of the Perl Monks concerning the following question:

Someone at work wrote to an in-house Perl discussion alias:

Does anyone know a clean way to test for
DOS v. UNIX EOL in a text file (using Perl) ? It seems
that (chop($UnixLine) == chop($DOSLine))returns true =-(

No one replied. I thought: hey, I might be a mere newbie, but I'll bet I can figure this out.

Wrong. :) I made some small test files with both <CR><LF> line endings and <LF>-only line endings. Then I watched the results of extracting only those characters from each line in the Perl Builder watch-window. It appeared as if the same characters were returned each time (never mind that one line-ending was \x0D\x0Aand the other was \x0Aalone).

It struck me that using ==there wasn't right; should he not be using "eq"? Had he in fact
actually compared 0 with 0? (0 == 0does have a certain ring of truth to it.:)

Does perl for Win32 "internally" convert Unix newlines to <CR><LF>?

Comment on Unix \n vs. DOS \n Select or Download Code

Replies are listed 'Best First'.
RE: Unix \n vs. DOS \n by Abigail (Deacon) on Jul 15, 2000 at 15:12 UTC
Somewhere, Tom has a large writing about this. But the basics is that DOS stores `CR LF` only for text files, and only when written on a physical device. As soon as you read it in, the C library turns the physical line ending of `CR LF` into the logical newline `\n`. And when you write it to a file, the reverse happens. That is, if you run the program under DOS. If you take your DOS file to a Unix platform, only the `LF` gets mapped to the logical newline `\n` (which happens to be represented with a `LF` character as well). The preceeding `CR` byte is considered by Unix to be just another byte. Also note that `chop` chops of the last character of a string. One character, nothing more. So, if you are on Unix, reading a line from either a Unix file or a DOS file, the last character will be `LF`, aka `\x0A`. So, yes, the comparison should have been done with `eq` instead of `==`, but that still doesn't make a difference, "\x0A" eq "\xOA". There is flawless way to determine wether something is a "Unix line" or a "DOS line". "Unix line"s end with a `LF` character, and "DOS line"s with `CR LF`. However, there is nothing that forbids a "Unix line" to have a `CR` character just before the `LF` character. -- Abigail	[reply]
(Ovid) RE: Unix \n vs. DOS \n by Ovid (Cardinal) on Jul 15, 2000 at 21:00 UTC
As a side note, don't use `chop` to get rid of newlines. I see this all the time in programs and it makes me cringe. You want to use `chomp`. `chomp` will only remove the last character if it's a newline. Consider the following "harmless" code: `#!/usr/bin/perl -w use strict; while (<DATA>) { chop; print "$_\n"; } __DATA__ this is a test this is another` [download] You can't see it in the above code, but I deliberately did not hit "Enter" after the last line. I even hit backspace a few times to ensure that there was nothing after the word "another". The result? `this is a test this is anothe` [download] `chop` happily removed the "r" in another. `chomp` was designed for situations like that and should be used where appropriate. Cheers, Ovid	[reply] [d/l] [select]
Re: Unix \n vs. DOS \n by vkonovalov (Monk) on Jul 15, 2000 at 14:34 UTC
You probably forgot to use "binmode" built-in function, which makes sence for text-mode or binary-mode. Otherwise, if you're inside perl script, then perl makes UNIX-like line endings, for example in HERE-IN strings and inside any strings: `$a=<<"EOS"; abcd efgh EOS` [download] and `$a="abcd efgh ";` [download] and `$a="abcd\nefgh\n";` [download] are the same.	[reply] [d/l] [select]
RE: Unix \n vs. DOS \n by BigJoe (Curate) on Jul 15, 2000 at 16:34 UTC
To answer your question Does perl for Win32 "internally" convert Unix newlines to CR-LF? NO. I usually do a regular exp to convert all \n s to \r\n like this `$mystring = ~s/\n/\r\n/g;` [download] Hope this helps. --BigJoe	[reply] [d/l]
Re: Unix \n vs. DOS \n by greenhorn (Sexton) on Jul 16, 2000 at 15:20 UTC
I believe the fellow at work used chop because he wanted to have Perl return the line-ending character; chomp seems to return only a result code and not the "chomped" character. I created a four-line text file in which two lines had CR/LF line endings and the other two had LF-only line endings. Then, a small script that reads each line of the file. Following is the business end of it. (All lines in the file have "F" immediately before the line boundary.) `# TWO LINES IN THE FILE MEET THE FOLLOWING CRITERIA: print "ends CRLF\n" if /F\x0D\x0A$/; print "ends CRLF\n" if /F\r\n/; print "contains CR\n" if /\x0D/; print "contains CR\n" if /\r/; # AND THE OTHER TWO LINES IN THE FILE MATCH THIS: print "ends LF only\n" if /F\x0A$/;` [download] But the script printed only this: ends LF only. It never did print ends CRLF or contains CR. If perl doesn't make some internal translation of the carriage-return characters when it's reading a file, then why that result? Are the tests above not sufficient?	[reply] [d/l]
(chromatic) RE: Re: Unix \n vs. DOS \n by chromatic (Archbishop) on Jul 17, 2000 at 01:51 UTC
chomp returns the number of characters removed. It removes whatever's in $/, so he can just check that. Update: Yes, $/, as jlp pointed out.	[reply]