in reply to noobie control char removal

I doubt they're control chars. Most likely, it's because you don't have a font installed that can handle that character. It could also represent some kind of error (e.g. a malformed character or the wrong encoding is being used by notepad).

If you want to delete the characters for which you have no font support, you'll have to be more specific concerning what those characters are.

Replies are listed 'Best First'.
Re^2: noobie control char removal
by desertman (Acolyte) on Nov 18, 2009 at 23:45 UTC
    any guidance on how to do that?

      No idea, short of printing out every character. There's millions of them, though, so going through the list could take time.

      Isn't that kind of arbitrary? Why would you remove characters if you have no idea what those characters are? It would make more sense to find out what the character is and add support for it.

      You could do that as follows:

      open(my $fh, '<:encoding(UTF-8)', $ARGV[0]) or die("Can't open input file \"$ARGV[0]\": $!\n"); $_ = do { local $/; <$fh> }; s/([^\x0A\x20-\x7E])/ sprintf '<U+%04X>', $1 /eg; print;
      My name is Éric.
      I don't speak 日本語.
      

      would show up as

      My name is <U+00C9>ric. I don't speak <U+65E5><U+672C><U+8A9E>.

      (Replace the encoding as appropriate.)

      Update: Added means of identifying characters.