in reply to Re^2: CR-LF on UTF-16LE files on Windows
in thread CR-LF on UTF-16LE files on Windows

:crlf converts 0D 0A into 0A on read, and it converts 0A into 0D 0A on write. This was being done to the encoded strings when it should have been done to the decoded strings.

(My earlier post has been edited to integrate this.)

Replies are listed 'Best First'.
Re^4: CR-LF on UTF-16LE files on Windows
by james28909 (Deacon) on Nov 08, 2018 at 00:04 UTC
    Would binmode() work? I don't have any files like that at my disposal to test.

      binmode would not work.

      When binmode applies :raw, it disables any existing :crlf layer rather than removing it. And a subsequent :crlf renables the existing :crlf layer rather than adding a new one. That means that

      binmode($fh, ':raw:encoding(UTF-16LE):crlf')

      is no different than

      binmode($fh, ':encoding(UTF-16LE)')

      It's therefore impossible to apply :encoding(UTF-16LE) to STDIN, STDOUT and STDERR on Windows (if you also want to :crlf). You'd need something like the following instead:

      open(my $fh, '<&=:raw:encoding(UTF-16le):crlf', fileno(STDIN)); *STDIN = $fh;

      (Untested)

        binmode would not work

        Is that so? I get the same (cases 3 and 4) correct result, regardless of layers stack being built through open or binmode.

        use strict; use warnings; use feature 'say'; use autodie; $, = ' '; { open my $f, '>:raw:encoding(UTF-16LE):crlf', 'test'; say $f 123; } { # 1 "pure binary slurp" open my $f, '<:raw', 'test'; undef local $/; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 2 OP's case open my $f, '<:encoding(UTF-16LE)', 'test'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 3 correct open my $f, '<:raw:encoding(UTF-16LE):crlf', 'test'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 4 correct open my $f, '<', 'test'; binmode $f, ':raw:encoding(UTF-16LE):crlf'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 5 same as #2 open my $f, '<', 'test'; binmode $f, ':encoding(UTF-16LE)'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } __END__ unix crlf 31 00 32 00 33 00 0d 00 0a 00 unix crlf encoding(UTF-16LE) utf8 31 32 33 0d 0a unix crlf encoding(UTF-16LE) utf8 crlf utf8 31 32 33 0a unix crlf encoding(UTF-16LE) utf8 crlf utf8 31 32 33 0a unix crlf encoding(UTF-16LE) utf8 31 32 33 0d 0a

        But can output of PerlIO::get_layers be believed at all? There are a few utf8 (pseudo- -?) layers for which I didn't ask. Also, the bottommost crlf layer is not removed but rather disabled, in both 3 and 4 (and 1, too) cases. And not re-enabled later.

        However, I can :pop (rather than "disable") existing layers, and here open and binmode behave differently: the latter doesn't allow to go to the bottom of the stack. Don't know if these factoids are of any value though.

        { # 6 open my $f, '<:pop:pop:unix:encoding(UTF-16LE):crlf', 'test'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 7 open my $f, '<', 'test'; binmode $f, ':pop:pop:unix:encoding(UTF-16LE):crlf'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } __END__ unix encoding(UTF-16LE) utf8 crlf utf8 31 32 33 0a unix encoding(UTF-16LE) utf8 crlf utf8 Use of uninitialized value in unpack at crlf.pl line 49. refcnt_dec: fd 0: 0 <= 0