$/ and DOS files

fxia has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: $/ and DOS files by John M. Dlugosz (Monsignor) on Jul 21, 2001 at 02:17 UTC
Even though the file contains "\r\n", the Perl script sees a plain "\n" when reading from the file, by default. Translation of EOLN marks is done on a more primitive level, so no matter what OS you are on, you use "\n" in Perl. (to disable that, use `binmode`)	[reply] [d/l]
Re: Re: $/ and DOS files by fxia (Novice) on Jul 23, 2001 at 23:10 UTC
But my case is to process the DOS file on a UNIX system. So the perl won't see \r\n the same as \n. If I do the truncation, it does work. But then I think it is a performance hit. I just wonder if there is any better way to do it.	[reply]
Re: Re: Re: $/ and DOS files by John M. Dlugosz (Monsignor) on Jul 23, 2001 at 23:57 UTC
As noted in another message, you can't compose a value to $/ that works like the magic built-in paragraph mode. So, you could implement a filter, and read from that filter. Or, slurp in the whole file and use split, which does allow regex for the delimiter. That is easy and speedy, if your file is small enough so memory is not an issue. Something like: `my @lines= split (/(?:\r?\n){2,}/, do { local $/; <INPUT>}); # lines already chomped, since delimiter not included.` [download] —John	[reply] [d/l]
Re: $/ and DOS files by synapse0 (Pilgrim) on Jul 21, 2001 at 02:19 UTC
You can set the $/ to be any delimiter you want. In your case, you probably want to use $/ = "\r\n\r\n"; (or whatever happens to be your delimiter of choice) -Syn0	[reply]
Re: Re: $/ and DOS files by John M. Dlugosz (Monsignor) on Jul 22, 2001 at 03:08 UTC
Not quite. The normal behavior of Perl (unless you use `binmode` is to treat the end of line sequence as "\n" no matter what the real form on the platform is. So the input command won't see the "\r" to match it. If you were operating in binmode, you would indeed need to reset $/ to match. But, the empty string has a special meaning. It will match any number of consecutive lines to be the terminator. "\n\n" will blindly take two lines, even if the third or more is still blank. There is no way to set $/ to a normal (non-magic) value to accomplish the same thing, since it takes a literal string not a regex. Perhaps that idea is outdated. Why not allow the record seperator to be a regex or even a code ref, and eliminate the special built-in case? —John	[reply] [d/l]