regex dies on international characters

87C751 has asked for the wisdom of the Perl Monks concerning the following question:

Oh monks of the mighty Perl:

I am trying to apply a regex to lines from a textfile that include some international characters, all from within a CGI.pm-based web module. \xe9, for example, is an accented 'o'. But it just won't happen. If wrapped in eval, the error log says: "translateStrings.cgi: Unrecognized character \\xE9 at (eval 32) line 6, <fh00001strings.txt> line 6930." Without the eval, the script simply dies with no output (even from use CGI::Carp qw(fatalsToBrowser);). The int'l characters are in a position to be matched by ([^\n]+)\n (and I even tried (.*?)\n). No joy.

I'm sure there is a simple incantation I've missed that will make this work in 5.0.8. (oh, I forgot to mention... the problem doesn't show up with 5.0.6) Would one of you have a clue to spare?

Comment on regex dies on international characters Select or Download Code

Replies are listed 'Best First'.
Re: by wolv (Pilgrim) on Jun 16, 2004 at 16:47 UTC
Perl 5.8.0 had deficiencies in its Unicode support, and it is recommended that you upgrade to a newer Perl, 5.8.4 preferably. It should fix the problems. I assume your 'Perl 5.0.8' is a brainfart or something. :)	[reply]
Re: regex dies on international characters by dave_the_m (Monsignor) on Jun 16, 2004 at 15:42 UTC
How about showing us the offending line(s) of code? Dave.	[reply]