Great! Thanks for reporting your solution back. :)

I think the issue was solely due to the input not being UTF-8 aware; it thought the BOM was ISO-8859 (i.e. the three characters ""); then when you wrote with UTF-8 awareness, they were translated into the appropriate UTF-8 sequence (C3 AF C2 BB C2 BF), which, when read as UTF-8, translates to the codepoints for "" ..!

I tested with this:

our $/; open(my $in, "<", "myfile"); open(my $out, ">", "myoutfile"); my $d = <$in>; print $out $d; close $out; close $in;

"myfile" has the content:

0000000: efbb bf68 656c 6c6f 2c20 776f 726c 640a  ...hello, world.

With the code above, Perl neither tries to interpret the BOM as a BOM in reading or writing, and "myoutfile" winds up like this:

0000000: efbb bf68 656c 6c6f 2c20 776f 726c 64    ...hello, world

(identical!) If we decide to interpret the input (only) as UTF-8, however, the BOM is interpreted as a UTF-8 sequence, and we get a warning about "Wide character in print" when trying to print it out to a filehandle that doesn't know about UTF-8:

$ perl test.pl Wide character in print at test.pl line 10, <$in> line 1. $

"myoutfile" still has the BOM prepended (is Perl just trying a UTF-8 representation?) in this case. The other notable thing when reading in with "<:utf8" is the value of ord($d): 0xFEFF. If we didn't use utf8, it comes out as 0xEF.

Using utf8 on both streams causes the BOM to be faithfully read in and written out; and using utf8 only on output tries to write the individual letters as they would be interpreted in ISO-8859 with in UTF-8:

0000000: c3af c2bb c2bf 6865 6c6c 6f2c 2077 6f72 ......hello, wor 0000010: 6c64 ld

Fun times!


In reply to Re^3: Don't want BOM in output file by anneli
in thread Don't want BOM in output file by beerman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.