in reply to win32 txt (with a £) -> decode -> encode_entities -> L with stroke
When you open that text file with a hex editor, what are the bytes (or the byte) corresponding to the £?
(If you have a Linux system available, hexdump -C is very helpful).
Update: HTML::Template does handle decoded strings with high codepoints correctly:
$ perl -MHTML::Entities=encode_entities -wle 'print encode_entities(ch +r hex "20AC")' €
Second update: wfsp /msg'ed me that the hexdump showed A3. So let's try to simulate this:
$ perl -we 'print chr(hex "A3")'|perl -MEncode -MHTML::Entities=encode +_entities -wle 'my $x = <>; print encode_entities(decode("cp1250", $x +))' Ł
So, no additional characters, just a Ł, which is the Unicode codepoint for capital L with stroke, (ie the output is correct).
So either the additional characters appear in the file, and the output is actually that you got is correct, or there's an additional IO layer somewhere that you haven't told us about (probably because you don't know about it).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: win32 txt (with a £) -> decode -> encode_entities -> L with stroke
by almut (Canon) on Jan 15, 2009 at 17:39 UTC | |
by ikegami (Patriarch) on Jan 15, 2009 at 17:56 UTC | |
by wfsp (Abbot) on Jan 15, 2009 at 18:35 UTC | |
|
Re^2: win32 txt (with a £) -> decode -> encode_entities -> L with stroke
by wfsp (Abbot) on Jan 15, 2009 at 16:18 UTC |