in reply to regexing for non-standard characters...

Not exactly on topic, but frequently when I'm dealing with lots of old and weird files and data with characters that are killing my scipts or causing other behavior, I just eliminate all the characters I do not need. Faster than trying to pinpoint which character is causing the problem.
$string =~ s/[^A-Za-z0-9]//g;
If all I need is letters and numbers.