McA has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I searched here and on the web for hours now, found many snippets, but not the solution to my problem. Hopefully you can give me wisdom.

If I push a so called encoding layer to a file handle with

binmode $fh, ":encoding(cp1252)";

knowing that the source character set (in this case utf-8) is a superset of the target character set, do I have a chance to instruct the encoding layer to ignore encoding problems by writing a substitution charater (like '?') to the output stream?

Thank you in advance.

Best regards
Andreas

Replies are listed 'Best First'.
Re: binmode, encoding layer, ignoring encoding problems
by ikegami (Patriarch) on May 17, 2010 at 14:27 UTC
    See $PerlIO::encoding::fallback
    use strict; use warnings; use PerlIO::encoding; { local $PerlIO::encoding::fallback = Encode::PERLQQ(); binmode STDOUT, ":encoding(cp1252)"; } print("\xc9\x{2660}\n");

    It doesn't seem to work properly if you pass a custom handler. (There is repetition in the output!)

      Hi ikegami,

      your information was a hint in the right direction. Thank you for that.

      But now I face a different error. I made a litte program to show the problem:

      #!/usr/bin/perl use strict; use warnings; use PerlIO::encoding; use Encode; use Data::Dumper; my $a = "\xe2\x82\xac"; #Euro Sign in UTF-8 my $euro_in_unicode = decode("UTF-8", $a); print Dumper(\$euro_in_unicode), "\n"; open my $fh, ">", "output.txt" or die $!; { local $PerlIO::encoding::fallback = Encode::FB_DEFAULT; binmode $fh, ":encoding(latin1)"; } print $fh $euro_in_unicode, "\n"; close $fh;

      The output of Dumper shows the right codepoint $VAR1 = \"\x{20ac}";. The file output.txt does have a ? sign in it as the Euro-Sign can't be displayed in REAL latin-1, but I get the additional output

      Close with partial character at am85.pl line 18. Close with partial character.
      on the console. Line 18 is the close statement.

      Can someone explain that? What do I have to do to get rid of that? What is wrong?

      Best regards
      Andreas

        I've encountered some problems too, as I mentioned. It appears to be a flaky or poorly documented interface. I don't have the time to hunt down the problem, but you could file a bug report and encode manually for now.
        open my $fh, ">", "output.txt" or die $!; print $fh encode('latin-1', $euro_in_unicode), "\n"; close $fh;

        Update: Taking a hint from the source, the following seems to work. No idea what STOP_AT_PARTIAL means.

        #!/usr/bin/perl use strict; use warnings; use PerlIO::encoding; use Encode; my $euro_in_unicode = chr(0x20AC); open my $fh, ">", "output.txt" or die $!; { local $PerlIO::encoding::fallback = Encode::FB_DEFAULT|Encode::STOP_AT_PARTIAL; binmode $fh, ":encoding(latin1)"; } print $fh "$euro_in_unicode\n"; close $fh;
Re: binmode, encoding layer, ignoring encoding problems
by choroba (Cardinal) on May 17, 2010 at 13:58 UTC
    If I understand your question, you want to convert a string and replace non-convertable characters with ?. See Handling Malformed Data in Encode documentation.
      Hi choroba,

      I know this piece of documentation. But how does it help me?

      Probably just some glue informations are missing, but at the moment I don't know how this document helps me.

      a) Encode says that the default for encoding is Encode::FB_DEFAULT and this means that there will be placed a substitution character when a character can't be encoded. That's not true with pushing an encoding layer to a file handle.

      b) Are the encoding mechanisms with the module Encode somehow interrelated with the encoding done by layers?

      Clearifing needed. :-)

      Best regards
      Andreas