in reply to using binmode() to override default encoding specified in "use open"

I see the same as you in my local Perl (v5.20.3). Inspecting the layers shows that binmode adds to the defaults, rather than replacing them. You need to call if first with no layers in order to do a full reset. (Note: I've corrected the argument to encoding too)

#!/usr/bin/perl -w use open qw/:std :encoding(iso-8859-1)/; # default I/O encoding my $s = "A \N{WHITE SMILING FACE} for you\n"; open (FILE, '> fpo'); # in the actual code, may op +en one of several things, or assign STDOUT to FILE my @layers = PerlIO::get_layers(FILE); print "Layers before binmode: @layers\n"; binmode(FILE, ':encoding(UTF-8)') if 1; # override the default enco +ding under certain conditions @layers = PerlIO::get_layers(FILE); print "Layers after binmode: @layers\n"; binmode(FILE) if 1; # reset to raw binmode(FILE, ':encoding(UTF-8)') if 1; # add our new encoding @layers = PerlIO::get_layers(FILE); print "Layers after reset: @layers\n"; warn "About to print"; # primitive trace statement print FILE "$s";

And the output to terminal is

Layers before binmode: unix perlio encoding(iso-8859-1) utf8 Layers after binmode: unix perlio encoding(iso-8859-1) utf8 encoding(u +tf-8-strict) utf8 Layers after reset: unix perlio encoding(utf-8-strict) utf8 About to print at /tmp/11119633.pl line 20.

There may be a neater way but this is at least a working solution, AFAICT.

Replies are listed 'Best First'.
Re^2: using binmode() to override default encoding specified in "use open"
by choroba (Cardinal) on Jul 22, 2020 at 13:24 UTC
    binmode(FILE) if 1; # reset to raw binmode(FILE, ':encoding(UTF-8)') if 1; # add our new encoding
    This can be shortened to
    binmode(FILE, ':raw:encoding(UTF-8)') if 1;
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^2: using binmode() to override default encoding specified in "use open"
by raygun (Scribe) on Jul 22, 2020 at 11:34 UTC

    Thank you — working solutions highly appreciated!

    I confess I don't understand this bit:

    binmode adds to the defaults, rather than replacing them.

    What does it even mean for a stream to have more than one encoding associated with it? The point of associating an encoding with a stream is to reduce ambiguity; a stream saying "eh, the encoding might be this or that" only increases ambiguity.

      I tend to agree with you that having multiple ":encoding" layers doesn't make a lot of sense. There may be a scenario where it does but I can't think of one right now.

      The docs for perliol include this gem:

      binmode() operates similarly to open(): by default the specified layers are pushed on top of the existing stack.

      So that agrees with what we see, and might well make sense for non-competing layers. It does rather appear that to override a specified default encoding with binmode you will need to do the reset first.

      The :encoding(...) that you pass to binmode is an example of a PerlIO layer, and the reason you can have more than one is that the layer system is supposed to be generic and usable for more than just encodings.

      In practice, associating multiple :encoding(...) layers to a read stream would mean that the data gets "decoded" more than once. This is almost certainly an error, but might be just what you need to fix some bizarre cases of mis-encoded data.

        Makes sense — thank you.

        The question is purely academic at this point, as I have my solution, but I'm curious what multiple encoding layers mean in an output stream. In my original example, if I replace \N{WHITE SMILING FACE} with the ISO 8859-1 character \N{REGISTERED SIGN}, the output file contains this character in 8859-1 encoding (the single byte \xAE). But if I then reverse the order of the encodings, the output file replaces this character with the (all-ASCII) string \xFFFD. \xFFFD seems completely unrelated to the REGISTERED SIGN character's encoding in either ISO 8859-1 or UTF-8.

        In other words, while I can see the use case you speak of for dealing with malformed input, I can't really see the use case for generating output unrelated to the content of the string. Perl does throw a warning, upon write, about being unable to properly handle the character, but it seems like it really ought to be warning at the moment a second encoding is put on the output stream, telling the user this is likely to generate garbage.