in reply to regex question

Have you considered something aside from a regex? For instance, HTML::Entities can do what you want. The following code should do what you want it to do:
use HTML::Entities; $foo = encode_entities($foo, "\x80-\xff");

With the information you provided using \x80-\xff should do the replacements you need. If there are other characters that require replacements/removal/modification, you may want to look at a conversion table or a lookup table for the appropriate codes.

Update thanks to ideas from ww.