Even though the file contains "\r\n", the Perl script sees a plain "\n" when reading from the file, by default. Translation of EOLN marks is done on a more primitive level, so no matter what OS you are on, you use "\n" in Perl.
(to disable that, use binmode) | [reply] [d/l] |
But my case is to process the DOS file on a UNIX system. So the perl won't see \r\n the same as \n. If I do the truncation, it does work. But then I think it is a performance hit. I just wonder if there is any better way to do it.
| [reply] |
As noted in another message, you can't compose a value to $/ that works like the magic built-in paragraph mode.
So, you could implement a filter, and read from that filter. Or, slurp in the whole file and use split, which does allow regex for the delimiter. That is easy and speedy, if your file is small enough so memory is not an issue.
Something like:
my @lines= split (/(?:\r?\n){2,}/, do { local $/; <INPUT>});
# lines already chomped, since delimiter not included.
—John | [reply] [d/l] |
You can set the $/ to be any delimiter you want. In your case, you probably want to use $/ = "\r\n\r\n"; (or whatever happens to be your delimiter of choice)
-Syn0 | [reply] |
Not quite.
The normal behavior of Perl (unless you use binmode is to treat the end of line sequence as "\n" no matter what the real form on the platform is. So the input command won't see the "\r" to match it. If you were operating in binmode, you would indeed need to reset $/ to match.
But, the empty string has a special meaning. It will match any number of consecutive lines to be the terminator. "\n\n" will blindly take two lines, even if the third or more is still blank. There is no way to set $/ to a normal (non-magic) value to accomplish the same thing, since it takes a literal string not a regex.
Perhaps that idea is outdated. Why not allow the record seperator to be a regex or even a code ref, and eliminate the special built-in case?
—John
| [reply] [d/l] |