in reply to By the shine on my bald pate, I dislike this encoding stuff

Strictly speaking, a file containing "\xA3" is not ASCII, since ASCII only consists of the characters from "\x00" to "\x7F". Maybe it's ISO Latin-1?

Also, your logic double-decodes the file. Assuming it is UTF-8, opening it '<:encoding(UTF-8)' decodes it, and then your decode() decodes it again.

My knee-jerk would be to apply Encode::Guess to the problem, since that way somebody else has worked out this mess for you, and since if you are going to convert the file to UTF-8 you need to know what its encoding currently is. If I just wanted to know if the file decoded as UTF-8 I might be lazy and do something like

open my $orderfile, '<:raw', $emailfile
    or return( @err, "Could not open $emailfile: $!" );
local $/ = undef;
my $filedata = <$orderfile>;
close $orderfile;
use Encode;
eval {
    decode( "utf-8", $filedata, Encode::FB_CROAK );
    1;
} or return( @err, "File was not encoded in UTF-8" );

One possible source of confusion in this horrible mess is that the ASCII encoding is a subset of the UTF-8 encoding, so technically there is no way to distinguish between a file encoded in ASCII and a file encoded in UTF-8

Replies are listed 'Best First'.
Re^2: By the shine on my bald pate, I dislike this encoding stuff
by Anonymous Monk on Mar 05, 2018 at 03:39 UTC
    Yep. Betcha the real problem is that the files which contain "non-ASCII characters" didn't use Unicode (UTF-8, UTF-16) to encode those characters, but instead used old-style code pages. But the program's logic assumes that it's Unicode without checking the entire file. I didn't see the OP ever describing what the nature of the "crash" actually is.