if there are no bytes with the 8th bit set then there's no problem -- nevermind else if ( any bytes match /[\xc0\xc1\xc4-\xff]/, or an odd number of bytes match /[\x80-\xff]/ ) then it must be Latin1 else make a copy delete everything that could be utf8 forms of Latin1 characters: s/\xc2[\xa0-\xbf]|\xc3[\x80-\xbf]//g; if this removes all bytes with 8th-bit set, then the original data is almost certainly utf8 else the original data is definitely Latin1 #### if any bytes match /[\x80-\x9f]/ then it's pretty sure not to be Latin1 #### eval "\$_ = decode('utf8',\$orig_data,Encode::FB_CROAK)"; if ($@) { # it's not utf8, and so must be iso-8859-1 }