karlberry has asked for the wisdom of the Perl Monks concerning the following question:

I want to catch the warning "Unicode non-character U+xxxx is illegal for open interchange", so I can output something else (that's valid).

The best approach I've been able to devise is to output to a string inside an eval, as follows. That works ok (as far as I can see), but seems excessively complex. I seek wisdom as to whether there is a simpler way. Here is the code I have:

my $arg = "10FFFF"; # for instance # simpler way? eval { use warnings FATAL => qw(all); my ($fh, $string); open($fh, ">", \$string) || die "open(string) failed: $!"; binmode($fh, ":utf8"); print $fh chr(hex("$arg")); }; die "good, caught it from eval: $@" if $@; # don't want to get here binmode(STDOUT, ":utf8"); print chr(hex($arg));

I've attempted to grok tchrist's exhaustive answer at stackexchange. As near as I can make out from that, the warning explicitly and correctly only happens on output.

Looking at the source (utf8.c), I lack the brainpower to comprehend whether the Perl_uvoffuni_to_utf8_flags() function which emits the warning is called under other circumstances.

Blindly looking at that code, it seems that what I'm imagining is Perl-level access to the UNICODE_IS_NONCHAR() macro, but it seems that that, as such, doesn't exist. Looking at its definition in utf8.h, I certainly don't want to reimplement it, even if I could.

Any ideas welcome. Thanks.

Replies are listed 'Best First'.
Re: catching Unicode non-character ... illegal for open interchange
by Laurent_R (Canon) on Feb 18, 2015 at 08:07 UTC
    And what's wrong with binmode or using utf-8 IO layer? Or the encode module?

    Beware that Tom's answer you mentioned is most probably very useful for general comprehension, but might be somewhat outdated with modern Perl versions. Between at least 5.6 and 5.16, each version of Perl has significantly improved Unicode processing.

    Je suis Charlie.
Re: catching Unicode non-character ... illegal for open interchange
by Anonymous Monk on Feb 18, 2015 at 01:26 UTC
    Looking at the source (utf8.c)
    Come on. perluniprops.
    binmode STDOUT, ':encoding(utf-8)'; my $arg = "\x{10FFFF}"; # for instance # simpler way? $arg =~ s/\p{NChar}/\x{FFFD}/; # replacement char print $arg, "\n";