in reply to regex match unicode characters in ascii string

Hi 3dbc,

If I'm understanding you correctly, you want to remove anything that's not printable ASCII?

# Keep only printable ASCII plus CR, LF, TAB $string =~ tr/\x09\x0A\x0D\x20-\x7E//cd; # Keep only alphanumeric plus space $string =~ tr/A-Za-z0-9 //cd;

Hope this helps,
-- Hauke D

Replies are listed 'Best First'.
Re^2: regex match unicode characters in ascii string
by 3dbc (Monk) on Jan 27, 2017 at 20:24 UTC
    Thank you!, I thought about this, but if I don't replace the extended ascii character with a space, how will i differentiate the group name with the role identifier?
    - 3dbc

      Hi 3dbc,

      replace the extended ascii character with a space
      $string =~ tr/\x09\x0A\x0D\x20-\x7E/ /c;

      See the documentation of tr/SEARCHLIST/REPLACEMENTLIST/cdsr under "Quote-Like Operators" in perlop.

      Hope this helps,
      -- Hauke D