You don't say what version of Perl you have (5.6.1? 5.8.0?); see whether you have the Encode module, and if you have it, try something like this:
This tries to "convert" the "octets" in $stringFromDB from utf8 into an "official" utf8 (Perl-internal) string -- in effect, if the data is already valid utf8, nothing changes, but the variable being assigned to will have its "utf8 flag" set (whereas this flag is probably not set in the "octet" string). When the data is malformed, setting the FB_CROAK arg tells decode to die on failure, so you can trap that with eval.use Encode; ... my ( $stringFromDB, $uft8string ); # # do whatever it is that queries the database and # assigns a string to $stringFromDB... # eval "\$utf8string = decode( 'utf8', \$stringFromDB, Encode::FB_CROAK +)"; if ( $@ ) { warn "DB value $stringFromDB is Malformed UTF8\n"; } ...
(As shown above, the "warn" usage might cause some other sort of warning as well, about "wide characters in print statement" or some such, but I haven't tested this specifically.)
In reply to Re: Malformed UTF-8 characters in Regular Expressions
by graff
in thread Malformed UTF-8 characters in Regular Expressions
by Wonko El Sano
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |