in reply to What's the best way to detect character encodings, Windows-1252 v. UTF-8?
I agree with bartmoritz. Due to some properties of UTF-8, it's very unlikely that cp1252-encoded text would be valid UTF-8*.
use Encode qw( decode ); my $bytes = '...'; my $txt; if (!eval { $txt = decode('UTF-8', $bytes, Encode::FB_CROAK|Encode::LEAVE_SRC); 1 # No exception }) { $txt = decode('Windows-1252', $bytes); }
* — Unless the encoded text contains no bytes above 0x7F, in which case it doesn't matter if you treat it as Windows-1252 or UTF-8.
|
|---|