in reply to Untainting text / unicode text
Of course, if you'd rather accept some other form of unicode, such as UTF-16LE or UTF16BE, just put one of those names in place of 'utf8' above. (Note that the fixed-width UTF-16 encodings do contain null bytes when conveying characters in the normal ASCII/Latin1 range, U0000 - U00FF.) But just stick with utf8 -- fewer traps.# assume that "$octets" is the string that has been recieved # from a form, and is purported to be utf8 text: ... use Encode; ... my $utf8str; eval "\$utf8str = encode( 'utf8', \$octets, Encode::FB_CROAK )"; if ( $@ ) { # $octets was not really a valid utf8 string } ...
Since you're not really doing anything "risky" with the text, just the utf8 validation should be a sufficient safeguard -- and it is important to do this, if you want people to post their content in a consistent, meaningful, usable form.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Untainting text / unicode text
by fireartist (Chaplain) on Jun 02, 2004 at 08:28 UTC | |
by graff (Chancellor) on Jun 02, 2004 at 21:15 UTC |