in reply to Perl Windows vs Cygwin installs

Don't mess with \r\n yourself.

When running the code with the Cygwin perl, add the PerlIO layer ":crlf" to the respective file handles, and with the other perl, don't, because a native Windows perl already has the :crlf layer enabled.  (Actually, you should be able to simply add the layer in both cases, because due to the implementation details, the layer will only ever be once on the layer stack.)

This presumes the idea is to generate files with the native Windows newline style.  If you just want your scripts to work within either a Unix/Cygwin or a Windows enviroment (without exchanging files between both "worlds"), simply use \n and be happy.

Replies are listed 'Best First'.
Re^2: Perl Windows vs Cygwin installs
by Anonymous Monk on Mar 23, 2012 at 21:17 UTC

      It still matters with newer perls, too.

      It's kind of a pity the patch you linked to doesn't really fix the issue it (apparently) set out to fix, i.e. the long standing bug with encodings like UTF-16 in combination with the :crlf layer.

      I just checked it with 5.15.8, and I still see the same "unexpected" behavior, as it always has been. That is, when naïvely pushing a UTF-16 layer to enable UTF-16 functionality (on Windows), corrupted files are produced on writing, and carriage returns are not being removed upon reading:

      --- writing ---

      #!/usr/local/perl/5.15.8/bin/perl -w my $fname = "foo.utf16"; open my $out, ">:crlf:encoding(UTF-16LE)", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n";
      $ ./test-out.pl $ hexdump foo.utf16 0000000 feff 1234 0a0d 7800 0d56 000a 000000c

      Wrong!  correct encoding should be:

      $ hexdump foo.utf16 0000000 feff 1234 000d 000a 5678 000d 000a 000000e

      --- reading ---

      #!/usr/local/perl/5.15.8/bin/perl -w use Devel::Peek; my $fname = "foo.utf16"; # create correct file, using the same old layer mantra # (the extra :utf8 is only required with older perls) open my $out, ">:raw:encoding(UTF-16LE):crlf:utf8", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n"; close $out; # read file back in open my $in, "<:crlf:encoding(UTF-16LE)", $fname or die; $/ = undef; Dump <$in>;
      $ ./test-in.pl SV = PV(0x77dc60) at 0x953728 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x829130 "\357\273\277\341\210\264\r\n\345\231\270\r\n"\0 [UTF8 + "\x{feff}\x{1234}\r\n\x{5678}\r\n"] CUR = 13 ^ ^ LEN = 14

      Wrong!  \r should've been removed.

      (Note that because I tested this on Unix, I had to push :crlf myself. With a native Windows perl, the layer would of course already have been in place — i.e., you'd just say ">:encoding(UTF-16LE)" or "<:encoding(UTF-16LE)" (as anyone unaware of the issue would likely have tried).)

      Personally, I think allowing another :crlf to be pushed on the stack (as it is now after the patch) is not the right approach to fix the issue, because you still have to manually rearrange the layers to get correct results.  I fail to see the benefit of being allowed to have two :crlf layers now.

        I don't feel like scrutinizing your post, but the wisdom from my link regarding 16le was to add :crlf last as in :raw:perlio:encoding(UTF-16le):crlf