in reply to Converting UTF8 to ANSI

\x{EF}\x{BB}\x{BF} is the UTF-8 BOM, but because you're looking for the UTF-8 encoded BOM EF BB BF instead of the Unicode character U+FEFF, that tells me you haven't opened the file with the right encoding, and personally I think decoding afterwards is more of a pain than opening the file with the right encoding in the first place. Also, by "ANSI" I assume you mean Windows-1252. Anyway, if you are certain that all of your UTF-8 encoded files begin with a BOM, you can use File::BOM, the following will open files that have a BOM with the proper encoding, but fall back to CP-1252 if they don't:

use File::BOM qw/open_bom/; open_bom(my $fh, $filename, ':encoding(cp1252)');

Otherwise, if you have no sure way of telling the files apart, you may have to use Encode::Guess, with the caveat that it's just a guess. Something like this maybe:

use Encode::Guess; open my $fh, '<:raw', $filename or die $!; read $fh, my $buf, 1024; # may need bigger buffer for better guess? close $fh; my $enc = guess_encoding($buf, qw/cp1252 utf8 UTF-16/); ref($enc) or die "Can't guess $filename: $enc"; print "$filename: guessed ",$enc->name,"\n"; #Debug open $fh, '<:encoding('.$enc->name.')', $filename or die $!;

In both cases, you may want to strip the BOM off the beginning of the data read from the file via $data =~ s/\A\x{FEFF}//;

Replies are listed 'Best First'.
Re^2: Converting UTF8 to ANSI
by ikegami (Patriarch) on Aug 28, 2017 at 04:26 UTC

    by "ANSI" I assume you mean Windows-1252.

    To find out a machine's actual "ANSI" encoding, you can use the following:

    use Win32 qw( ); my $ansi_enc = "cp".Win32::GetACP();

      Excellent tip. FWIW, I had to update to use it.

      PS C:\Users\moo> perl -MWin32 -E 'say Win32::GetACP()' Undefined subroutine &Win32::GetACP called at -e line 1. PS C:\Users\moo> cpanm Win32 --> Working on Win32 Fetching http://www.cpan.org/authors/id/J/JD/JDB/Win32-0.52.tar.gz ... + O Configuring Win32-0.52 ... OK Building and testing Win32-0.52 ... OK Successfully installed Win32-0.52 (upgraded from 0.44) 1 distribution installed PS C:\Users\moo> perl -MWin32 -E 'say Win32::GetACP()' 1252
      Thank you.
      I didn't even know a machine can have, an actual vs other, ANSI encoding.
      I'll look in to it as soon as I can.
Re^2: Converting UTF8 to ANSI
by palkia (Monk) on Aug 29, 2017 at 18:02 UTC
    Thank you for your replay.

    You are correct, my understanding of the differences between specific encoding form is strictly theoretical and limited at the moment, especially in the perl context (1st time I ever encounter the term BOM).
    I hope to learn more about it as soon as I can.

    As for what I mean by ANSI, I really don't know.
    All I know is what the encoding line says when I "save as" a file with "Notepad" (Win-xp).

    Unfortunately I'm currently preoccupied with the fallout of attempting to install File::BOM as you can see here.
    Any assistance with this bigger issue will be most appreciated.
      As for what I mean by ANSI, I really don't know.

      Welcome to the wonderful world of character encodings!

      What you may mean is the ASCII character encoding. This is an old, 7-bit encoding with the most significant bit (bit 7) always 0. One neat thing about the newer UTF-8 encoding (some people say it's the only neat thing) is that all valid ASCII characters are automatically valid UTF-8 characters. Unfortunately, things quickly go to pieces after that; not all valid UTF-8 characters are valid ASCII, and any mapping of UTF-8 to ASCII is totally arbitrary. Oh, well...


      Give a man a fish:  <%-{-{-{-<