in reply to Re: Unicode problem
in thread Unicode problem

As I confess in my other reply, I don't have a windows box, so I don't know... Maybe the initial ":raw" is needed to defeat the intrinsic ":crlf" layer that is always imposed first by default on that OS.

But I'd be really surprised if there was any real need or impact of final ":utf8" -- I think you can dispense with that. (It certainly looks nonsensical having it there.)

In any case, an attentive reading of the PerlIO manual would be good medicine.

update: Thanks for the following reply, almut. It seems I shouldn't have been so surprised after all!

Replies are listed 'Best First'.
Re^3: Unicode problem
by almut (Canon) on Aug 21, 2007 at 04:52 UTC
    I'd be really surprised if there was any real need or impact of final ":utf8" -- I think you can dispense with that.

    The reason you need the final :utf8 is that the crlf layer is kinda turning off the UTF8-ness (or however you want to call it...). In other words, if you have a string containing non-ASCII characters (which was the reason for inventing Unicode in the first place, wasn't it :), you'd get nonsense, because the utf8 flag will either be ignored (on output), or not be set (on input). Of course, if you're only outputting an ASCII-only string like "hello", you won't see a difference...

    For example, when replacing the "e" in "hello" with an "ä" (a-umlaut, U+00E4), you'd get correct output with

    open my $fh, ">:raw:encoding(UTF-16LE):crlf:utf8", "ok.utf16" or die; print $fh "h\x{00e4}llo\n"; $ od -tx1 -An ok.utf16 68 00 e4 00 6c 00 6c 00 6f 00 0d 00 0a 00

    but not with

    open my $fh, ">:raw:encoding(UTF-16LE):crlf", "err.utf16" or die; print $fh "h\x{00e4}llo\n"; $ od -tx1 -An err.utf16 68 00 00 00 6c 00 6c 00 6f 00 0d 00 0a 00 ^^ wrong

    accompanied by the warning when running the code:

    Malformed UTF-8 character (unexpected non-continuation byte 0x6c, immediately after start byte 0xe4) in null operation at ...