in reply to How to tell if a stream is already in UTF8 mode?

To me, this looks like an XY Problem. It doesn't sound like a good thing to make a "lower level routine" distinguish between different kinds of file handles with different IOLayers tied upon them. Where does your routine get filehandles from? Why they are opened in different modes? Wouldn't it be a better idea to binmode it some appropriate IOLayer like :utf8 or :encoding(...) on an upper level of subroutines?

Anyway, you can use PerlIO::get_layers($fh) to get the list of layer names on a filehandle (got from PerlIO perldoc page). Checking whether filehandle is in utf-8 mode is then reduced to grepping for the "utf8" string (usually the last element of the array).

Replies are listed 'Best First'.
Re^2: How to tell if a stream is already in UTF8 mode?
by perl-diddler (Chaplain) on Jan 03, 2014 at 21:18 UTC
    Where does your routine get filehandles from? Why they are opened in different modes?

    It's a lower-level library formatting routine. Think of asking in "printf FH,...", "where does printf get its file handles from? Why would printf get FH's opened in different modes?"

    It gets the FH from user programs with FH coming from STD(OUT,ERR) or other opened destinations. By the time printf gets it, it doesn't know if the FH was set for unicode or binary. The lower level layers 'know', and will emit a warning if they detect chars > 255 on a stream NOT marked as UTF8, AND will not encode chars between 128 - 255, as UTF8 unless the stream was previously marked as UTF8.

    It doesn't sound like a good thing to make a "lower level routine" distinguish between different kinds of file handles with different IOLayers tied upon them.

    The problem isn't that it is a lower-level routine, but that it isn't "low enough"... I.e. the lower-I/O layers know if the stream had binmode called on the stream.

    Just guessing, now, but likely 'get_layers', may be the way, combined with a for loop to match -- matching only on the 1st char to eliminate possibilities and checking if the name (UTF-8 or utf8) is in a hash might give optimal perf-checks, then caching that as the state for that stream.

    It's a one way trip -- i.e. if the routine detects > 255-valued chars in the stream, it knows the stream "needs" to be in utf8 mode, but there aren't any single-byte values that would force a reverse (since all bytes can be part of a UTF-8 encoded data stream).

    Thanks for the pointer to get_layers...it's not documented on its own manpage...

      The lower level layers 'know', and will emit a warning if they detect chars > 255 on a stream NOT marked as UTF8, AND will not encode chars between 128 - 255, as UTF8 unless the stream was previously marked as UTF8.

      The lower level always expects bytes. (Files are blocks/streams of bytes.) It will ALWAYS emit a warning if it detects chars >255.