in reply to Re^4: Unicode2ascii
in thread Unicode2ascii

Glad you corrected this - I'm not that proficient on Windows ;-)

I wonder about the leading sequence 0xff 0xfe in notepad saved text files - is that some marker indicating the encoding type?

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^6: Unicode2ascii
by ikegami (Patriarch) on Nov 28, 2006 at 15:36 UTC

    Glad you corrected this - I'm not that proficient on Windows ;-)

    UCS-2 and UTF-16 are practically identical. The former is fixed width (like iso-latin-1) and the latter is variable width (like UTF-8). The pros and cons for using UTF-8 over iso-latin-1 also apply to using UTF-16 over UCS-2.

    Windows uses UCS-2LE. Not knowing anything about UCS-2 'til today, I've been blindly using UTF-16LE.

    I wonder about the leading sequence 0xff 0xfe in notepad saved text files - is that some marker indicating the encoding type?

    It's a Byte Order Mark (BOM).

Re^6: Unicode2ascii
by BrowserUk (Patriarch) on Nov 28, 2006 at 15:31 UTC
    I wonder about the leading sequence 0xff 0xfe in notepad saved text files - is that some marker indicating the encoding type?

    It's called the "byte order mark" (BOM), and is used to detect the little or big-endianness (?) of the data in the file.

    See BOM.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.