in reply to Re^3: A UTF8 round trip with MySQL
in thread A UTF8 round trip with MySQL
I'd be interested to know the risks involved.
The most obvious risk involved is that your program can halt if you have malformed internal data. The error "Malformed UTF-8 character" is fatal. Less obvious risks include security bugs because things may be interpreted differently at different levels: something may pass an untainting regex, but still be unsafe in a library call. This is because there is no single standard way of dealing with malformed byte sequences. With naive (yet common) C code it can even lead to data corruption.
The following change is in current blead:
--- perl-current/pod/perldiag.pod 2007-01-02 19:17:01.000000000 ++0100 +++ mijn/pod/perldiag.pod 2007-03-03 18:12:23.000000000 +0100 @@ -2263,12 +2263,19 @@ =item Malformed UTF-8 character (%s) -(S utf8) (F) Perl detected something that didn't comply with UTF-8 -encoding rules. +(S utf8) (F) Perl detected a string that didn't comply with UTF-8 +encoding rules, even though it had the UTF8 flag on. -One possible cause is that you read in data that you thought to be in -UTF-8 but it wasn't (it was for example legacy 8-bit data). Another -possibility is careless use of utf8::upgrade(). +One possible cause is that you set the UTF8 flag yourself for data th +at +you thought to be in UTF-8 but it wasn't (it was for example legacy +8-bit data). To guard against this, you can use Encode::decode_utf8. + +If you use the C<:encoding(UTF-8)> PerlIO layer for input, invalid by +te +sequences are handled gracefully, but if you use C<:utf8>, the flag i +s +set without validating the data, possibly resulting in this error +message. + +See also L<Encode/"Handling Malformed Data">.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: A UTF8 round trip with MySQL
by Joost (Canon) on Jun 13, 2007 at 21:01 UTC | |
by Juerd (Abbot) on Jun 13, 2007 at 21:12 UTC | |
|
Re^5: A UTF8 round trip with MySQL
by mje (Curate) on Mar 31, 2009 at 10:24 UTC |