snaporaz has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have this problem I can't seem to solve. I use Perl 5.8 under Windows XP. I want to read a Unicode UTF16 file, do some processing, and write out again Unicode UTF-16.

I can't print out a proper new line. I tried 'pack', 'chr' (using the Unicode codes, not the "\r" "\n" names), and many other options. Using an hexadecimal editor I see I always end up printing out "0D 00 0D 0A 00" and not "0D 00 0A 00". It seems that no matter how I refer to the line feed character using the 000A code, a "0D" byte gets automatically stuck in front of it. Here is my (non-working) snippet:

use strict; my $infile=shift; my $outfile=shift; open IN, "<:encoding(UTF-16LE)", $infile or die; open OUT, ">:encoding(UTF-16LE)", $outfile or die; binmode OUT, ':utf8'; while (<IN>) { chomp; print OUT "$_"; #also tried: #print OUT "\x{0D}\x{0A}"; my $r=pack("U", 0x000D); my $n=pack("U", 0x000A); print OUT $r; print OUT $n; #also tried: #my $n=chr(0x000D); etc. } close(IN); close(OUT);
Any idea? Thanks in advance

Replies are listed 'Best First'.
Re: newline in unicode under windows
by iburrell (Chaplain) on Jul 12, 2004 at 20:16 UTC
    There is a bug in the UTF-16 and CRLF filters. The CRLF filters, which adds the CR on Windows, works with bytes and happens after the UTF-16 encoding. It ends up putting in a 0x0D byte instead of 0D 00 bytes for the encoding.

    One solution is to remove the crlf layer.

    open(my $fh, ">:raw", $file); binmode($fh, ":encoding(ucs2le)"); print $fh, "\r\n";
    Another is to put the crlf before the ucs2le encoding, but I am not sure how to do that.
      Thanks for the heads up on the bug. And your solution worked for me. Thanks!
      Unfortunately, AFAICT this isn't going to be fixed for 5.8.5. Though it was reported by activestate, so it's possible they might have come up with a fix for ActivePerl that didn't get submitted back upstream.
Re: newline in unicode under windows
by PodMaster (Abbot) on Jul 12, 2004 at 19:44 UTC
    Now I'm not a unicode expert, but why did you binmode OUT, ':utf8' if you want to write UTF-16LE? That could very well be your problem.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.