Re^2: win32 txt (with a Ł) -> decode -> encode

... &#x141, which is the Unicode codepoint for capital L with stroke, (ie the output is correct).

I think wfsp's point is that the pound sign (A3) should remain £ — i.e. the Unicode codepoint U+00A3 (pound sign) vs. U+0141 (capital L with stroke). ~~IOW, I don't think the output is correct...~~

Update: as ikegami points out, cp1250 does not correspond to Latin-1 (ISO 8859-1), as I was misled to assume (and maybe wfsp, too?) — the difference between cp1250 and cp1252 then of course explains the output...

Comment on Re^2: win32 txt (with a Ł) -> decode -> encode_entities -> L with stroke Select or Download Code

Replies are listed 'Best First'.
Re^3: win32 txt (with a Ł) -> decode -> encode_entities -> L with stroke by ikegami (Patriarch) on Jan 15, 2009 at 17:56 UTC
According to Wikipedia. iso-8859-1's A3 codepoint is the pound sign (U+00A3). cp1252's A3 codepoint is the pound sign (U+00A3) since it's based on iso-8859-1. iso-8859-2's A3 codepoint is uppercase L with stroke (U+0141). cp1250's A3 codepoint is uppercase L with stroke (U+0141) since it's based on iso-8859-2. If this information is accurate, Encode is producing the proper output and wfsp's expectations are wrong. `use Encode qw( decode ); for (qw( iso-8859-1 cp1252 iso-8859-2 cp1250 )) { printf( "%-11s U+%04X\n", "$_:", ord( decode($_, "\xA3") ) ); }` [download] `iso-8859-1: U+00A3 cp1252: U+00A3 iso-8859-2: U+0141 cp1250: U+0141` [download] Update: Added to node.	[reply] [d/l] [select]
Re^4: win32 txt (with a Ł) -> decode -> encode_entities -> L with stroke by wfsp (Abbot) on Jan 15, 2009 at 18:35 UTC
...wfsp's expectations are wrong. Yup. Thanks for straightening out my muddle.	[reply]