I'd be interested to know the risks involved.
The most obvious risk involved is that your program can halt if you have malformed internal data. The error "Malformed UTF-8 character" is fatal. Less obvious risks include security bugs because things may be interpreted differently at different levels: something may pass an untainting regex, but still be unsafe in a library call. This is because there is no single standard way of dealing with malformed byte sequences. With naive (yet common) C code it can even lead to data corruption.
The following change is in current blead:
--- perl-current/pod/perldiag.pod 2007-01-02 19:17:01.000000000 ++0100 +++ mijn/pod/perldiag.pod 2007-03-03 18:12:23.000000000 +0100 @@ -2263,12 +2263,19 @@ =item Malformed UTF-8 character (%s) -(S utf8) (F) Perl detected something that didn't comply with UTF-8 -encoding rules. +(S utf8) (F) Perl detected a string that didn't comply with UTF-8 +encoding rules, even though it had the UTF8 flag on. -One possible cause is that you read in data that you thought to be in -UTF-8 but it wasn't (it was for example legacy 8-bit data). Another -possibility is careless use of utf8::upgrade(). +One possible cause is that you set the UTF8 flag yourself for data th +at +you thought to be in UTF-8 but it wasn't (it was for example legacy +8-bit data). To guard against this, you can use Encode::decode_utf8. + +If you use the C<:encoding(UTF-8)> PerlIO layer for input, invalid by +te +sequences are handled gracefully, but if you use C<:utf8>, the flag i +s +set without validating the data, possibly resulting in this error +message. + +See also L<Encode/"Handling Malformed Data">.
In reply to Re^4: A UTF8 round trip with MySQL
by Juerd
in thread A UTF8 round trip with MySQL
by clinton
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |