in reply to Re: win32 txt (with a £) -> decode -> encode_entities -> L with stroke
in thread win32 txt (with a £) -> decode -> encode_entities -> L with stroke

... &#x141, which is the Unicode codepoint for capital L with stroke, (ie the output is correct).

I think wfsp's point is that the pound sign (A3) should remain £ — i.e. the Unicode codepoint U+00A3 (pound sign) vs. U+0141 (capital L with stroke).  IOW, I don't think the output is correct...

Update: as ikegami points out, cp1250 does not correspond to Latin-1 (ISO 8859-1), as I was misled to assume (and maybe wfsp, too?) — the difference between cp1250 and cp1252 then of course explains the output...

Replies are listed 'Best First'.
Re^3: win32 txt (with a £) -> decode -> encode_entities -> L with stroke
by ikegami (Patriarch) on Jan 15, 2009 at 17:56 UTC

    According to Wikipedia.

    • iso-8859-1's A3 codepoint is the pound sign (U+00A3).
    • cp1252's A3 codepoint is the pound sign (U+00A3) since it's based on iso-8859-1.
    • iso-8859-2's A3 codepoint is uppercase L with stroke (U+0141).
    • cp1250's A3 codepoint is uppercase L with stroke (U+0141) since it's based on iso-8859-2.

    If this information is accurate, Encode is producing the proper output and wfsp's expectations are wrong.

    use Encode qw( decode ); for (qw( iso-8859-1 cp1252 iso-8859-2 cp1250 )) { printf( "%-11s U+%04X\n", "$_:", ord( decode($_, "\xA3") ) ); }
    iso-8859-1: U+00A3 cp1252: U+00A3 iso-8859-2: U+0141 cp1250: U+0141

    Update: Added to node.

      ...wfsp's expectations are wrong.
      Yup.

      Thanks for straightening out my muddle.