in reply to Remove u200b unicode From String
perl -we '$chr = "Ǟ"; $s = "abc" . $chr . "xyz"; print "$s\n"; $s =~ s/$chr/ /g; print "$s\n"'outputs
abcǞxyz abc xyzAlternatively, you can do something like the following to find characters outside the ascii range:
use Encode; my $s = get_s_from_somewhere(); my $chars = decode("UTF-8", $s); my %non_ascii; for my $i (0..length($chars)-1) { if( ord(substr($chars, $i, 1)) > 127 ) { $non_ascii{ substr($chars, $i, 1) }++; } } do_something_with_non_ascii(\%non_ascii)
|
|---|