in reply to How to set the UTF8 flag?

utf8::is_utf8($string) will tell you whether a string is stored as utf8 characters or single byte characters (no require/use needed). And utf8::upgrade($string) will convert a string stored as single byte characters to being stored as utf8. But that's not usually what you want; you want a layer on the filehandle that will convert whichever form is being output to utf8 (or whatever other encoding you choose). You can set this with open or after the file is opened with binmode.

But some actual sample code/data would be very helpful; when you say "writing two characters for every Unicode byte" it makes me think you have some misconceptions that we could help clear up.

Replies are listed 'Best First'.
Re^2: How to set the UTF8 flag?
by dissident (Beadle) on Aug 18, 2025 at 05:28 UTC
    Great thanks!
    When searching in the web, I only found some (obviously outdated) information that there would not exist reliable ways to check the UTF-8 bit.
    So is_utf8() was exactly what I needed to circle in the problems'cause.
    Turned out that HTTP::Tiny does not support Unicode, just raw text.
    Thus the issue was resolved by UTF-8-encoding its byte string data through decode().
      Turned out that HTTP::Tiny does not support Unicode, just raw text.
      That's correct. The statement in the documentation of HTTP::Tiny might deserve a more prominent representation:
      Content data in the request/response is handled as "raw bytes". Any encoding/decoding (with associated headers) are the responsibility of the caller.

      Turned out that HTTP::Tiny does not support Unicode, just raw text.

      HTTP has no concept of encoding. It's just a file transfer protocol.

      By definition, text files don't have an encoding defined within, so HTTP headers can be used to communicate the encoding of the text file. But that doesn't mean that the HTTP agent should automatically decode the file. And it that doesn't apply to binary files such as XHTML. Even modern HTML is really a binary file.