Sounds tough. You might try a genetic programming approach, breeding programs which apply transformations until one can reproduce the original STRING to NORMALIZED_STRING mapping. That's a lot easier said than done though, and it would be extra tough to do it with Perl. A code-is-data language like Scheme is a more natural fit.
You might also look at how String::Approx works. It solves a similar problem, although I don't think you'll be able to use it directly.
-sam