in reply to BOM (was: Re^2: Another Unicode/emoji question)
in thread Another Unicode/emoji question

Or is it as simple as writing \x{efbbbf} as the first thing after the HTTP headers?

The string my $str = "\x{efbbbf}"; does not contain the BOM character, it contains U+EFBBBF, which is not valid a valid Unicode character (AFAIK: I believe Unicode only goes to U+1FFFFF U+10FFFF). The string my $str = "\x{feff}"; contains the BOM character.

If you did use the string you suggested, whether with raw mode or with UTF-8 output encoding, you will not get what you thought:

C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\x{efbbbf})" + | xxd Wide character in print at -e line 1. 00000000: f8bb bbae bf ..... C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print +qq(\x{efbbbf})" | xxd Code point 0xEFBBBF is not Unicode, may not be portable in print at -e + line 1. 00000000: 5c78 7b45 4642 4242 467d \x{EFBBBF}

Neither of those outputs the UTF-8 bytes for the BOM U+FEFF character.

Instead, you either need to manually send the three octets separately in raw mode, or use raw mode and manually encode from a perl string into UTF-8 bytes, or use UTF-8 output encoding and send the U+FEFF character from the string directly:

C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\xef\xbb\xbf +)" | xxd 00000000: efbb bf ... C:\Users\Peter> perl -MEncode -e "binmode STDOUT, ':raw'; print Encode +::encode('UTF-8', qq(\x{feff}));" | xxd 00000000: efbb bf ... C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print +qq(\x{feff})" | xxd 00000000: efbb bf ...

Whether or not that would "work" in your use-case is something I don't know: my guess is that it won't help, because anything that's using HTTP headers should be paying attention to the encoding listed in the headers, and not requiring a BOM in the message body. Though I guess if it's saving the HTTP message body into a file, and then later using that file, maybe the BOM would help. I don't know on that, sorry.

--
warning: Windows quoting used in code blocks; swap quotes around if you're on linux

Replies are listed 'Best First'.
Re^2: BOM (was: Re^2: Another Unicode/emoji question)
by Bod (Parson) on Dec 25, 2023 at 21:04 UTC
    my guess is that it won't help, because anything that's using HTTP headers should be paying attention to the encoding listed in the headers, and not requiring a BOM in the message body

    This article implies that a BOM is needed. But, I'm not sure if that applies to a URL calendar feed or just an imported file.