in reply to Don't want BOM in output file

From perlio (emphasis mine):
    :utf8
        Declares that the stream accepts perl's *internal* encoding of
        characters. (Which really is UTF-8 on ASCII machines, but is
        UTF-EBCDIC on EBCDIC machines.) ...

        Note that this layer does not validate byte sequences. For reading
        input, using ":encoding(utf8)" instead of bare ":utf8" is strongly
        recommended.

I recommend looking at the utf8::all module, which wraps all these confusing utf8 machinations in one pragma, and allows you do simply use '<' or '>' as the mode when opening text files (see that module's synopsis).

Replies are listed 'Best First'.
Re^2: Don't want BOM in output file
by Eliya (Vicar) on Oct 14, 2011 at 19:27 UTC
    For reading input, using ":encoding(utf8)" instead of bare ":utf8" is strongly recommended.

    While this is correct (for security reasons), it's unlikely to help with the OP's (presumed) problem of getting rid of a BOM in the input data. In other words, :encoding(utf8) (just like :utf8) does not filter out the BOM:

    my $file = "somefile.utf8"; # create a UTF-8 encoded test file, explicitly adding a BOM open my $out, ">:utf8", $file or die $!; print $out "\x{feff}foo bär"; close $out; # read it back in open my $in, "<:encoding(utf8)", $file or die $!; $_ = <$in>; use Devel::Peek; Dump $_;
    SV = PV(0x793cd0) at 0x7c53e0 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x7c9088 "\357\273\277foo b\303\244r"\0 [UTF8 "\x{feff}foo b\x{ +e4}r"] CUR = 11 ^^^^ LEN = 80