in reply to Re^3: Unicode2ascii
in thread Unicode2ascii

It's UCS-2LE, the fixed-width variant of UTF-16LE.
use strict; use warnings; my $file_in = '...'; my $file_out = '...'; open(my $fh_in, '<:raw:encoding(UCS-2LE)', $file_in) or die("Unable to open \"$file_in\": $!\n"); open(my $fh_out, '>:raw:encoding(UCS-2LE)', $file_out) or die("Unable to create file \"$file_out\": $!\n"); while (<$fh_in>) { ... print $fh_out $_; }

Update: Oops, originally confirmed that it was UTF-16LE.

Replies are listed 'Best First'.
Re^5: Unicode2ascii
by shmem (Chancellor) on Nov 28, 2006 at 15:12 UTC
    Glad you corrected this - I'm not that proficient on Windows ;-)

    I wonder about the leading sequence 0xff 0xfe in notepad saved text files - is that some marker indicating the encoding type?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      Glad you corrected this - I'm not that proficient on Windows ;-)

      UCS-2 and UTF-16 are practically identical. The former is fixed width (like iso-latin-1) and the latter is variable width (like UTF-8). The pros and cons for using UTF-8 over iso-latin-1 also apply to using UTF-16 over UCS-2.

      Windows uses UCS-2LE. Not knowing anything about UCS-2 'til today, I've been blindly using UTF-16LE.

      I wonder about the leading sequence 0xff 0xfe in notepad saved text files - is that some marker indicating the encoding type?

      It's a Byte Order Mark (BOM).

      I wonder about the leading sequence 0xff 0xfe in notepad saved text files - is that some marker indicating the encoding type?

      It's called the "byte order mark" (BOM), and is used to detect the little or big-endianness (?) of the data in the file.

      See BOM.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.