In perlre, see the characters classes namely, print, ctrl, graph, so strip accordingly. Or, to put it other way, strip everything that does not match that you want to preserve.
I've been loading the contents of an Excel spread sheet (what was actually supposed to be just plain text), however it contained probably 50 random occurances of the vertical tab within 3000 rows of 50 columns. This character imparted a line break when inspecting DB rows in a terminal but was otherwise, invisible. I considered stripping everything exception a subset, but lots of uni-characters were necessary.
But, regarding my original question, is there some relationship between the non-printing character formatting used with "cat -vte <file>" and determining the correct character class to use in a regex?
For future, (at least in context of things Unix) note that '^' is used to denote 'Ctrl' when it is not '^[' where it is 'Escape'. '^I' denotes a tab (in vim it can be changed to any other character sequence (see ':help listchars')).
I think you would be better off opening|dumping the file through hexdump(1) like program which can show you the numeric representation of characters. Then, you would note the value of the charcters that you want to replace to be plugged in for substitution.