Re: Unicode surrogate is illegal in UTF-8

You can use eval with FATAL warnings:

#!/usr/bin/perl
use strict;
use warnings;

use open OUT => ':encoding(UTF-8)', ':std';
use warnings FATAL => 'utf8';

my $text = { string => "t\x{daed}\x{ffff}\x{daee}\x{c8}\n" };

1 until eval {
    print $text->{string};
    1;

} or do {
    my ($charcode) = $@ =~ /U\+(\S+)/ or die $@;
    print STDERR "Removing $charcode because of $@";
    $text->{string} =~ s/\x{$charcode}//g;
    0; # Try again!
};
[download]

Update: handles both "non-character" and "surrogate" cases. I wasn't able to trigger the "non_unicode" warnings.

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Comment on Re: Unicode surrogate is illegal in UTF-8 Download Code

Replies are listed 'Best First'.
Re^2: Unicode surrogate is illegal in UTF-8 by Rodster001 (Pilgrim) on Aug 03, 2015 at 19:11 UTC
This doesn't seem to work for me. It reports the warning, but $@ does not get set.	[reply]
Re^3: Unicode surrogate is illegal in UTF-8 by Rodster001 (Pilgrim) on Aug 03, 2015 at 20:05 UTC
Nevermind, this worked for me (unrelated typo). Thanks!	[reply]
Re^2: Unicode surrogate is illegal in UTF-8 by Rodster001 (Pilgrim) on Aug 03, 2015 at 20:33 UTC
Could I generate/detect this warning without using "print" (i.e. so I could fix/replace silently)?	[reply]
Re^3: Unicode surrogate is illegal in UTF-8 by choroba (Cardinal) on Aug 03, 2015 at 21:23 UTC
You can print to a filehandle that doesn't lead anywhere: `open my $VOID, '>', \ my $void; 1 until eval { print {$VOID} $text->{string}; # ...` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^3: Unicode surrogate is illegal in UTF-8 by afoken (Chancellor) on Aug 05, 2015 at 03:33 UTC
Could I generate/detect this warning without using "print" (i.e. so I could fix/replace silently)? Have a look at Handling Malformed Data. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]