in reply to work with unicode data and regexps

You might want to try setting your locale to one that supports Unicode:
use locale; use POSIX 'locale_h'; my $loc = "en_EN.utf-8"; setlocale(LC_CTYPE, $loc) or die "Invalid locale $loc";
Or use a UNIX-GNU/Linux command like recode or iconv to convert your input to your character set. Hope that helps.

Addendum: I'm not sure this will work on XP, but perldoc perllocale should be able to help you--and there is a Windows version of recode.