Re: Handling HTML special characters correctly

Replies are listed 'Best First'.
Re^2: Handling HTML special characters correctly by LesleyB (Friar) on Jul 02, 2008 at 19:20 UTC
As I did yesterday, using it to convert C code to safe HTML text. As a general principle, always HTML-escape any data received from a form before displaying it again. If any data is to go on to a database or be used to access data in a database then that really must be SQL escaced to limit/prevent SQL injection attacks. These two procedures are not language specific. Always use the taint flag in perl CGI scripts i.e `#!/usr/bin/perl -T` or `#!/usr/bin/perl -wT` to also have warnings on. The way to untaint form data is to use regexps. This verifies the data is in the range expected.	[reply] [d/l] [select]
Re^2: Handling HTML special characters correctly by pc88mxer (Vicar) on Jul 02, 2008 at 19:50 UTC
Just want to point out that you don't need to convert the code-point \xA3 to `£` when outputting it. If you are only using latin-1 characters, you shouldn't have to use `encode_entities` on anything but the special HTML characters: <, >, &, and ". The code-point \xA3 is directly representable in latin-1 and utf-8 (and any other reasonable encoding you would use for your web page.) You only have to use `encode_entities` on those code-points which are not directly representable by the character set (encoding) used for your page.	[reply] [d/l] [select]
Re^3: Handling HTML special characters correctly by monarch (Priest) on Jul 02, 2008 at 22:11 UTC
..although "`\x{A3}`" in UTF-8 would be encoded as two bytes ("`\x{C2}\x{\xA3}`"). See UTF-8 encoding table. Update: removed superfluous parenthesis.	[reply] [d/l] [select]
Re^4: Handling HTML special characters correctly by cosmicperl (Chaplain) on Jul 03, 2008 at 00:25 UTC
Ł never used to cause me a problem on the old RH9. But these days most web servers seem to be set to en_us.UTF-8, where outputting Ł will give you a nasty ? in the browser, needs to be £ these days. On a side note just noticed something annoying about HTML::Entities, if your input is already encoded, such as £, you'll get &pound;, thought it would have checked for encoded characters and skipped them? Lyle	[reply]
Re^5: Handling HTML special characters correctly by tinita (Parson) on Jul 03, 2008 at 09:01 UTC
Re^5: Handling HTML special characters correctly by ikegami (Patriarch) on Jul 03, 2008 at 03:36 UTC