in reply to utf8::upgrade is untainting

I'm wondering why you would decide to use utf8::upgrade() on tainted strings -- as opposed to using Encode::decode(), which offers much better controls for handling malformed character data. (If the data is tainted, how can you assume that it can always be treated as valid unicode text?)

If you are doing taint checking at all, and you need to convert a tainted string to utf8 (or validate and flag it as a utf8 string), it would seem much more sensible to handle it like this:

my $rawstring = ...; # coming from a cgi param or whatever my $utfstring; eval { $utfstring = decode( "utf8", $rawstring, Encode::FB_CROAK ) } if ( $@ ) { # do something sensible given that $rawstring is invalid # (i.e. cannot be converted successfully to utf8) }
If your expected input (i.e. the tainted data that should pose no difficulty for proper untainting) is not a utf8 octet stream, then all the more reason to use Encode, because as perldoc utf8 says:
Note that this function does not handle arbitrary encodings. Therefore Encode.pm is recommended for the general purposes.

(emphasis in the original) But even if well-behaved input is expected to be utf8 octets, the fact that it's tainted means "don't count on that!"

In other words, don't use utf8::upgrade() on tainted strings. Period.

(update: added the "perldoc utf8" link, to clarify the source of the quotation)

Replies are listed 'Best First'.
Re^2: utf8::upgrade is untainting
by tinita (Parson) on Aug 16, 2007 at 07:13 UTC
    I'm wondering why you would decide to use utf8::upgrade() on tainted strings
    it's not my code - it's in a framework i'm using, and i decided to find out what happens if i add -T to my scripts, and nothing happened, and searching for the reason i stumbled over this upgrade().
    so thanks for your comments, i'll check if Encode could be used instead of utf8::upgrade.