Mur has asked for the wisdom of the Perl Monks concerning the following question:
How can I coerce Perl's \b pattern to recognize and respect accented characters?
This does what you'd expect, and scalar(@words)==1. However, if you change the 'e' in Mexico to an accented e (México), then it splits before and after the accented 'e', and scalar(@words)==3.my $land = 'Mexico'; my @words = split(/\b/,$land);
use utf8; works as long as the literal appears in the text; reading in the text from outside the code doesn't work. E.g.,
$ perl -Mutf8 -e "print join(q{,},split(/\b/,\$ARGV[0])),qq{\n}" méxic +o m,é,xico $ perl -Mutf8 -e "print join(q{,},split(/\b/,q{méxico})),qq{\n}" Wide character in print at -e line 1. méxico
| -- | |||||||
|
| ||||||
| ...Nexcerpt...Connecting People With Expertise | |||||||
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Stuck in accent-land
by bart (Canon) on Dec 30, 2003 at 18:27 UTC | |
by Mur (Pilgrim) on Dec 30, 2003 at 18:47 UTC | |
|
Re: Stuck in accent-land
by ysth (Canon) on Dec 30, 2003 at 21:04 UTC | |
|
Re: Stuck in accent-land
by pg (Canon) on Dec 30, 2003 at 18:45 UTC | |
by Roy Johnson (Monsignor) on Dec 30, 2003 at 19:10 UTC | |
|
Re: Stuck in accent-land
by dominix (Deacon) on Dec 31, 2003 at 11:36 UTC |