in reply to Re: noobie control char removal
in thread noobie control char removal

any guidance on how to do that?

Replies are listed 'Best First'.
Re^3: noobie control char removal
by ikegami (Patriarch) on Nov 18, 2009 at 23:54 UTC

    No idea, short of printing out every character. There's millions of them, though, so going through the list could take time.

    Isn't that kind of arbitrary? Why would you remove characters if you have no idea what those characters are? It would make more sense to find out what the character is and add support for it.

    You could do that as follows:

    open(my $fh, '<:encoding(UTF-8)', $ARGV[0]) or die("Can't open input file \"$ARGV[0]\": $!\n"); $_ = do { local $/; <$fh> }; s/([^\x0A\x20-\x7E])/ sprintf '<U+%04X>', $1 /eg; print;
    My name is Éric.
    I don't speak 日本語.
    

    would show up as

    My name is <U+00C9>ric. I don't speak <U+65E5><U+672C><U+8A9E>.

    (Replace the encoding as appropriate.)

    Update: Added means of identifying characters.