in reply to Re: CGI hidden params vs. character encoding
in thread CGI hidden params vs. character encoding

Taking another look at the "utf8 security" issue, here's what I'm taking as the "primary reference" (at least, the one here at PM): UTF8 related proof of concept exploit released at T-DOSE.

The key point, I think, is this:

Once the UTF8 flag is set, Perl does not check the validity of the UTF8 sequences further. Typically, this is okay, because it was Perl that set the flag in the first place. However, some people set the UTF8 flag manually. They circumvent protection built into encoding/decoding functions and PerlIO layers, either because it's easier (less typing), for performance reasons, or even because they don't know they're doing something wrong.

This problem is unrelated to the use of "decode()" shown in the OP script here. The "decode()" function is used to take a string (ignoring its utf8 flag) and try to interpret it as a utf8 byte string. Using "decode()" with its default behavior (as shown in the OP), any input bytes that are not interpretable as utf8 data will be replaced by the "?" character, and the result will always be a valid utf8 string (with the utf8 flag set by perl).

My reading of the exploit is that you only get into trouble when you deliberately twiddle the utf8 flag of a scalar yourself, without checking to see whether it really is fully interpretable as valid utf8 characters. So I would conclude that the OP script is not a case that poses a security problem involving the use of utf8 data.

  • Comment on Re^2: CGI hidden params vs. character encoding

Replies are listed 'Best First'.
Re^3: CGI hidden params vs. character encoding
by ikegami (Patriarch) on May 27, 2008 at 23:46 UTC

    This problem is unrelated to the use of "decode()" shown in the OP script here

    You're right. I thought binmode($untrusted_fh, ':utf8') was the same as decode('utf8', $untrusted), but it's the same as _utf8_on($untrusted).