Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've got a really simple template module that
reads in a file and replaces tokens in the template -
imagine a HTML page.

When I use a template containing Japanese
characters (UNICODE) the regular expression does not
match - I don't need to match the UNICODE characters -
just the tokens that look like this: '{{token}}'.
Does anyone know you to make Perl ignore/skip the
UNICODE characters in RegExp? I've tried: use utf8; but
is still didn't work.
Any ideas? Thanks Matt
  • Comment on Reading a file containgin UNICODE and Regex matching

Replies are listed 'Best First'.
Re: Reading a file containgin UNICODE and Regex matching
by mirod (Canon) on Jan 31, 2002 at 15:17 UTC

    The first version of Perl that will support Unicode in regexps will be 5.8 ;--( You can try using 5.7.2 in the meantime but it is a development version.

    You will have to specifically match unicode characters (I guess UTF8) for your expression to skip them. A unicode character might (cargo-cult programming warning here): ([\xC0-\xDF].|[\xE0-\xEF]..|[\xF0-\xFF]...) or they might interfere with the regexps matching the tokens.