Unless you're using an ancient version of Perl, \w should match any Unicode word character. According to perlre there are over 100,000 characters it matches.
use 5.010; use strict; use warnings; use utf8::all; my $string = "the café"; say "GOT: $1" if $string =~ /(\w{4})/;
Make sure your strings are being interpreted as character strings rather than byte strings though. (See perlunicode and utf8.)
In reply to Re: match utf8
by tobyink
in thread match utf8
by glassel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |