comment on

In addition to the issue poj pointed out with reading from <ORDERFILE> twice (my(@LINES) = <ORDERFILE> reads all lines from the file, so $filedata would normally be empty), I just wanted to point out that the pattern eval {...}; if ($@) {...} has issues and that the pattern eval {...; 1} or do {...} or a module like Try::Tiny is better. Also, nowadays lexical filehandles (open my $fh, ...) are generally preferred over bareword filehandles (open ORDERFILE, ...). (Update: The AM also made a good point that you appear to be decoding the data twice.)

Really, the best way to go is to know in advance what encoding your files are in, and then opening them with the appropriate encoding in open my $fh, '<:encoding(...)', $filename or die $!;

You may want to have a look at The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

If you are sure that your files can only ever be UTF-8 or "ANSI" (by which I'm going to assume you mean Windows-1252, but see also), then you could use the following to guess which of the two the file is encoded in, but be aware this may still get things wrong, e.g. if you also have files encoded in, say, Latin-1 or Latin-9 (see also) this may not throw an error because the those encodings are so similar to CP1252!

use warnings;
use strict;
use Encode qw/decode/;

sub guess_utf8_cp1252 { # WARNING: Does NOT work for other encodings
    my ($fn) = @_;
    open my $fh, '<:raw', $fn or die "$fn: $!";
    my $raw = do { local $/; <$fh> }; # slurp
    close $fh;
    my $decoded;
       eval { $decoded = decode('UTF-8',  $raw, Encode::FB_CROAK ); 1}
    or eval { $decoded = decode('CP1252', $raw, Encode::FB_CROAK ); 1}
    or die "$fn: Could not decode";
    return $decoded;
}
[download]

In reply to Re: By the shine on my bald pate, I dislike this encoding stuff by haukex
in thread By the shine on my bald pate, I dislike this encoding stuff by jfrm

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.