comment on

Or is it as simple as writing \x{efbbbf} as the first thing after the HTTP headers?

The string my $str = "\x{efbbbf}"; does not contain the BOM character, it contains U+EFBBBF, which is not valid a valid Unicode character (AFAIK: I believe Unicode only goes to ~~U+1FFFFF~~ U+10FFFF). The string my $str = "\x{feff}"; contains the BOM character.

If you did use the string you suggested, whether with raw mode or with UTF-8 output encoding, you will not get what you thought:

C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\x{efbbbf})"
+ | xxd
Wide character in print at -e line 1.
00000000: f8bb bbae bf                             .....


C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print 
+qq(\x{efbbbf})" | xxd
Code point 0xEFBBBF is not Unicode, may not be portable in print at -e
+ line 1.
00000000: 5c78 7b45 4642 4242 467d                 \x{EFBBBF}
[download]

Neither of those outputs the UTF-8 bytes for the BOM U+FEFF character.

Instead, you either need to manually send the three octets separately in raw mode, or use raw mode and manually encode from a perl string into UTF-8 bytes, or use UTF-8 output encoding and send the U+FEFF character from the string directly:

C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\xef\xbb\xbf
+)" | xxd
00000000: efbb bf                                  ...

C:\Users\Peter> perl -MEncode -e "binmode STDOUT, ':raw'; print Encode
+::encode('UTF-8', qq(\x{feff}));" | xxd
00000000: efbb bf                                  ...

C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print 
+qq(\x{feff})" | xxd
00000000: efbb bf                                  ...
[download]

Whether or not that would "work" in your use-case is something I don't know: my guess is that it won't help, because anything that's using HTTP headers should be paying attention to the encoding listed in the headers, and not requiring a BOM in the message body. Though I guess if it's saving the HTTP message body into a file, and then later using that file, maybe the BOM would help. I don't know on that, sorry.

--
warning: Windows quoting used in code blocks; swap quotes around if you're on linux

In reply to Re: BOM (was: Re^2: Another Unicode/emoji question) by pryrt
in thread Another Unicode/emoji question by Bod

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.