if there are no bytes with the 8th bit set then
there's no problem -- nevermind
else
if ( any bytes match /[\xc0\xc1\xc4-\xff]/, or
an odd number of bytes match /[\x80-\xff]/ ) then
it must be Latin1
else
make a copy
delete everything that could be utf8 forms of Latin1 characters:
s/\xc2[\xa0-\xbf]|\xc3[\x80-\xbf]//g;
if this removes all bytes with 8th-bit set, then
the original data is almost certainly utf8
else
the original data is definitely Latin1
####
if any bytes match /[\x80-\x9f]/ then
it's pretty sure not to be Latin1
####
eval "\$_ = decode('utf8',\$orig_data,Encode::FB_CROAK)";
if ($@) {
# it's not utf8, and so must be iso-8859-1
}