No idea, short of printing out every character. There's millions of them, though, so going through the list could take time.
Isn't that kind of arbitrary? Why would you remove characters if you have no idea what those characters are? It would make more sense to find out what the character is and add support for it.
You could do that as follows:
open(my $fh, '<:encoding(UTF-8)', $ARGV[0]) or die("Can't open input file \"$ARGV[0]\": $!\n"); $_ = do { local $/; <$fh> }; s/([^\x0A\x20-\x7E])/ sprintf '<U+%04X>', $1 /eg; print;
My name is Éric. I don't speak 日本語.
would show up as
My name is <U+00C9>ric. I don't speak <U+65E5><U+672C><U+8A9E>.
(Replace the encoding as appropriate.)
Update: Added means of identifying characters.
In reply to Re^3: noobie control char removal
by ikegami
in thread noobie control char removal
by desertman
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |