in reply to UTF8 versus \w in pattern matching
Is your source file encoded as UTF-8?
Personally, I prefer to use charnames; and then to use \N{...} escapes in my source code for non-ASCII constants:
#!/usr/bin/perl use strict; use warnings; use charnames ':full'; binmode STDOUT, ':utf8'; my $a; $a = "/i/\N{LATIN SMALL LETTER A WITH ACUTE}\N{LATIN SMALL LETTER E WI +TH ACUTE}\N{LATIN SMALL LETTER I WITH ACUTE}\N{LATIN SMALL LETTER O W +ITH ACUTE}\N{LATIN SMALL LETTER U WITH ACUTE}z/pl"; print qq(1: ),$a,qq(\n); ($a) = ($a =~ m/^([\/\p{Word}]+)/); print qq(2: ),$a,qq(\n);
This prints the following for me:
1: /i/áéíóúz/pl 2: /i/áéíóúz/pl
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: UTF8 versus \w in pattern matching
by mldvx4 (Hermit) on Jul 06, 2021 at 10:37 UTC | |
by hippo (Archbishop) on Jul 06, 2021 at 10:57 UTC | |
by mldvx4 (Hermit) on Jul 06, 2021 at 11:48 UTC | |
by hippo (Archbishop) on Jul 06, 2021 at 12:20 UTC | |
by mldvx4 (Hermit) on Jul 06, 2021 at 12:33 UTC | |
| |
|
Re^2: UTF8 versus \w in pattern matching
by mldvx4 (Hermit) on Jul 06, 2021 at 10:26 UTC |