BOM (was: Re^2: Another Unicode/emoji question)

Replies are listed 'Best First'.
Re: BOM (was: Re^2: Another Unicode/emoji question) by pryrt (Abbot) on Dec 24, 2023 at 16:10 UTC
Or is it as simple as writing `\x{efbbbf}` as the first thing after the HTTP headers? The string `my $str = "\x{efbbbf}";` does not contain the BOM character, it contains U+EFBBBF, which is not valid a valid Unicode character (AFAIK: I believe Unicode only goes to ~~U+1FFFFF~~ U+10FFFF). The string `my $str = "\x{feff}";` contains the BOM character. If you did use the string you suggested, whether with raw mode or with UTF-8 output encoding, you will not get what you thought: `C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\x{efbbbf})" + \| xxd Wide character in print at -e line 1. 00000000: f8bb bbae bf ..... C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print +qq(\x{efbbbf})" \| xxd Code point 0xEFBBBF is not Unicode, may not be portable in print at -e + line 1. 00000000: 5c78 7b45 4642 4242 467d \x{EFBBBF}` [download] Neither of those outputs the UTF-8 bytes for the BOM U+FEFF character. Instead, you either need to manually send the three octets separately in raw mode, or use raw mode and manually encode from a perl string into UTF-8 bytes, or use UTF-8 output encoding and send the U+FEFF character from the string directly: `C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\xef\xbb\xbf +)" \| xxd 00000000: efbb bf ... C:\Users\Peter> perl -MEncode -e "binmode STDOUT, ':raw'; print Encode +::encode('UTF-8', qq(\x{feff}));" \| xxd 00000000: efbb bf ... C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print +qq(\x{feff})" \| xxd 00000000: efbb bf ...` [download] Whether or not that would "work" in your use-case is something I don't know: my guess is that it won't help, because anything that's using HTTP headers should be paying attention to the encoding listed in the headers, and not requiring a BOM in the message body. Though I guess if it's saving the HTTP message body into a file, and then later using that file, maybe the BOM would help. I don't know on that, sorry. -- warning: Windows quoting used in code blocks; swap quotes around if you're on linux	[reply] [d/l] [select]
Re^2: BOM (was: Re^2: Another Unicode/emoji question) by Bod (Parson) on Dec 25, 2023 at 21:04 UTC
my guess is that it won't help, because anything that's using HTTP headers should be paying attention to the encoding listed in the headers, and not requiring a BOM in the message body This article implies that a BOM is needed. But, I'm not sure if that applies to a URL calendar feed or just an imported file.	[reply]