Re^2: Handling HTML special characters correctly

Just want to point out that you don't need to convert the code-point \xA3 to £ when outputting it. If you are only using latin-1 characters, you shouldn't have to use encode_entities on anything but the special HTML characters: <, >, &, and ".

The code-point \xA3 is directly representable in latin-1 and utf-8 (and any other reasonable encoding you would use for your web page.) You only have to use encode_entities on those code-points which are not directly representable by the character set (encoding) used for your page.

Comment on Re^2: Handling HTML special characters correctly Select or Download Code

Replies are listed 'Best First'.
Re^3: Handling HTML special characters correctly by monarch (Priest) on Jul 02, 2008 at 22:11 UTC
..although "`\x{A3}`" in UTF-8 would be encoded as two bytes ("`\x{C2}\x{\xA3}`"). See UTF-8 encoding table. Update: removed superfluous parenthesis.	[reply] [d/l] [select]
Re^4: Handling HTML special characters correctly by cosmicperl (Chaplain) on Jul 03, 2008 at 00:25 UTC
Ł never used to cause me a problem on the old RH9. But these days most web servers seem to be set to en_us.UTF-8, where outputting Ł will give you a nasty ? in the browser, needs to be £ these days. On a side note just noticed something annoying about HTML::Entities, if your input is already encoded, such as £, you'll get &pound;, thought it would have checked for encoded characters and skipped them? Lyle	[reply]
Re^5: Handling HTML special characters correctly by tinita (Parson) on Jul 03, 2008 at 09:01 UTC
On a side note just noticed something annoying about HTML::Entities, if your input is already encoded, such as £, you'll get £, well, how would you encode it? given an input like `If you write & in HTML, it turns out as &` The expected output of such a text after encoding would be: If you write &amp; in HTML, it turns out as & Now you're saying, only the last ampersand should be escaped? Because the first one is already escaped? No, you never know if a text is already escaped.	[reply] [d/l]
Re^5: Handling HTML special characters correctly by ikegami (Patriarch) on Jul 03, 2008 at 03:36 UTC
thought it would have checked for encoded characters and skipped them? That would be BAD! Decoding and encoding your post would change "such as £" to "such as £". On a side note just noticed something annoying about HTML::Entities, if your input is already encoded Duh? The same thing happens if you try to encode characters twice. The same thing happens if you try to encode URL characters twice. The same thing happens if you try to zip a string twice. etc	[reply]