in reply to Windows file read

To attempt to correct for this, I set the $/ variable to "\x0D\x0A". While processing the files, I then did a quick s/\x0A/ /g to "fix" the data. When tested in Linux, this worked perfectly. Unfortunately, this program must run in Windows.

That will work in Windows if (and only if) you binmode IN first. For example:

local $/ = "\x0D\x0A"; open(local *IN, '<', $filename) or die("Unable to open input file $filename: $!\n"); binmode(IN); while (<IN>) { chomp; s/\x0A/ /g; ... }

Without binmode, occurances of "\x0D\x0A" are converted to "\x0A" before Perl looks for the line ending.

Replies are listed 'Best First'.
Re^2: Windows file read
by thedoe (Monk) on May 01, 2006 at 16:54 UTC

    While this did preserve the CRLF at the end of each line, I still ran into the same problem as when I opened the file simply doing:

    open IN, '<:raw', $file or die "Unable to open file: $file";

    This problem is that, despite setting $/ to "\x0D\x0A", the lines with a rogue LF (\x0A) are still split into two lines. I have used a hex editor to make sure that ONLY a LF is present, so I am not misreading the data.

    Sadly, this puts me right back to the beginning problem. I appreciate all your assistance so far ikegami. I now know about layering the file modes, however unfortunately my line break problem still exists.

      It works for me. Show your code, please. Mine is the following:
      { open(my $fh, '>:raw', 'file') or die("open>: $!\n"); print $fh ("abc\x{0D}\x{0A}de\x{0A}fg\x{0D}\x{0A}"); # [------5------][---------7----------] # [------5------][---3--][------4-----] } { open(my $fh, '<:raw', 'file') or die("open<: $!\n"); local $/ = "\x0D\x0A"; print length, "\n" while <$fh>; }
      outputs
      5 7

      If you remove the assignment to $/, the output is

      5 3 4

      Oddly enough, when I extracted the hex information to a simple text file containing only a problem line and one line before and after it, the same code I was having a problem with worked.

      I am now re-running on the much larger, original file. If I run into another problem now, though, I will know that there must be some type of extra character which, for some reason or another, is not being reported by my hex editor.

      Thank you again for your help ikegami. I will update this post with the results of the larger run.

      Update: Unfortunately, the source file I have is still giving me this problem. I am looking into what could be doing this. I have extracted the lines around it into a temporary file, but do not have this problem with the temp file. It seems to only happen in the main source. Thank you again for your help, as I now know where I need to look for the (hopeful) solution to my dilemma.

      Update 2: Wow...after spending two days on this, I have just learned that someone modified the input file after I looked at it in hex to include a true line break in that position. Why? I have no idea. But the mystery has finally been solved. Thank you again to ikegami for pointing me back towards where I had been looking. At least now I know I'm not too crazy

Re^2: Windows file read
by thedoe (Monk) on May 01, 2006 at 16:22 UTC

    I am currently opening the file using:

    open IN, '<:utf8', $file or die "Can not open input file: $file";

    It was my understanding that specifying a format while opening the file will open it, then call binmode on it. Is this incorrect?

      I'm not very familiar with PerlIO (:utf8, etc). I suspect that if you do

      open IN, '<:utf8', $fn or ...; binmode(IN); # Short for binmode(IN, ':raw') in v5.8

      you will lose the :utf8 property. You could try

      open IN, '<:raw:utf8', $fn or ...;

      but :raw and :utf8 might be mutually exclusive. Fortunately, it's easy to try these and see if they work.

      Update: This page says the previous snippet will work. Your code would look like:

      local $/ = "\x0D\x0A"; open(local *IN, '<:raw:utf8', $filename) or die("Unable to open input file $filename: $!\n"); while (<IN>) { chomp; s/\x0A/ /g; ... }