thedoe has asked for the wisdom of the Perl Monks concerning the following question:
I am using ActiveState Perl in a Windows environment to read and process a file. This file has line terminations in CRLF (carriage return, line feed) format. The problem is that some of the lines contain a rogue LF in them. There is no way I can correct the way this data comes to me.
If the file is simply opened and read using while (<IN>), the lines with a rogue LF are split into two separate lines. This data is therefore not processed and all line counts for further error reporting are thrown off.
To attempt to correct for this, I set the $/ variable to "\x0D\x0A". While processing the files, I then did a quick s/\x0A/ /g to "fix" the data. When tested in Linux, this worked perfectly. Unfortunately, this program must run in Windows.
When I ran the program under ActiveState in a Windows command window, however, I found that the while (<IN>) would try to read the entire file at once, similar to setting $/ to undef. Upon further investigation by commenting out the $/ assignment, I found that each line read by while (<IN>) under this environment was giving me a line terminated by simply LF, with the CR completely removed. The length was one character less than the true line length.
I am not sure when the CR is being removed. I tried opening the file in different modes (currently I am using utf8, since the file contains unicode). I found that under the :raw mode, the while loop would break on lines if $/ was set to "\x0D\x0A". The problem was, it would break on both a CRLF and a LF. This brought me right back to my original problem.
I am aware that the C function read() will remove the CR from a line when reading it in, while the function _read() does not. Could this be the problem that I am dealing with? If so, is there a way to force Perl to use _read(). If I can't force it to do so, does anyone know another way around the read() difficulty. And if this is not the problem at all, but it is something else, could someone give me a helpful point in the right direction?
Thanks!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Windows file read
by ikegami (Patriarch) on May 01, 2006 at 16:18 UTC | |
by thedoe (Monk) on May 01, 2006 at 16:54 UTC | |
by ikegami (Patriarch) on May 01, 2006 at 17:15 UTC | |
by thedoe (Monk) on May 01, 2006 at 19:02 UTC | |
by thedoe (Monk) on May 01, 2006 at 16:22 UTC | |
by ikegami (Patriarch) on May 01, 2006 at 16:27 UTC |