in reply to FCGI, tied handles and wide characters

STDOUT isn't really a file handle in your case. It's a tied object that presents the interface of a handle, but isn't actually. And layers (such as :encoding(UTF-8)) aren't supported by a tied handles.

So, rather than relying on an encoding layer, encode explicitly.


I could already hear the lynch mobs baying about the scandalous use of _utf8_off

And rightly so, since you're effectively encoding the the scalar using utf8 when is_utf8 is true, but you fail to do so when is_utf8 is false.

sub my_print($) { my($string) = @_; my $is_utf8 = is_utf8(${$string}); _utf8_off(${$string}) if($is_utf8); print ${$string}; _utf8_on(${$string}) if($is_utf8); } my_print( \$string );
should be
sub my_print($) { my $string_ref = shift; my $string = $$string_ref; utf8::encode( $string ); print $string; } my_print( \$string );

Better yet,

sub my_print { my $s = join( $,, @_ ) . $\; utf8::encode( $s ); print( $s ); } my_print( $string );

Replies are listed 'Best First'.
Re^2: FCGI, tied handles and wide characters
by cavac (Prior) on Sep 09, 2024 at 15:00 UTC

    STDOUT isn't really a file handle in your case. It's a tied object that presents the interface of a handle, but isn't actually. And layers (such as :encoding(UTF-8)) aren't supported by a tied handles.

    I rather suspect that a similar problem could lurk in incoming data as well. I would certainly check if incoming Umlauts, Emojis and other Unicode stuff gets decoded correctly. In the long run, it might also pay to run some Unicode normalization to make sure the same text is always encoded the same way (especially for usernames, passwords and such). Unicode equivalence can be rather annoying sometimes, see also: incorrect length of strings with diphthongs.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
    Also check out my sisters artwork and my weekly webcomics

      Definitely. FCGI::Request replaces STDIN, STDOUT and STDERR by default.