Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have some invalid utf-8. I want to catch the warning messages, and just un/redefine the string.

For example:
print Dumper $text; $VAR1 = { 'string' => "Text\x{daed}", }; print $text->{'string'}; 'Unicode surrogate U+DAED is illegal in UTF-8 at line 5.
I've tried something like:
local $SIG{__WARN__} = sub { print "WARNING!\n"; };
Which catches the warning, but I lose context. I want to detect the warning then just set $text->{'string'} = "bad utf-8 encoding" or better yet, remove the offending character (regex?) so I end up with $text->{string} = "Text".

Thanks for your help.

Replies are listed 'Best First'.
Re: Unicode surrogate is illegal in UTF-8
by choroba (Cardinal) on Aug 03, 2015 at 18:40 UTC
    You can use eval with FATAL warnings:
    #!/usr/bin/perl use strict; use warnings; use open OUT => ':encoding(UTF-8)', ':std'; use warnings FATAL => 'utf8'; my $text = { string => "t\x{daed}\x{ffff}\x{daee}\x{c8}\n" }; 1 until eval { print $text->{string}; 1; } or do { my ($charcode) = $@ =~ /U\+(\S+)/ or die $@; print STDERR "Removing $charcode because of $@"; $text->{string} =~ s/\x{$charcode}//g; 0; # Try again! };

    Update: handles both "non-character" and "surrogate" cases. I wasn't able to trigger the "non_unicode" warnings.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      This doesn't seem to work for me. It reports the warning, but $@ does not get set.
        Nevermind, this worked for me (unrelated typo). Thanks!
      Could I generate/detect this warning without using "print" (i.e. so I could fix/replace silently)?
        You can print to a filehandle that doesn't lead anywhere:
        open my $VOID, '>', \ my $void; 1 until eval { print {$VOID} $text->{string}; # ...
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Could I generate/detect this warning without using "print" (i.e. so I could fix/replace silently)?

        Have a look at Handling Malformed Data.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)