I slipped into a deep meditation for the last several months, and neglected to stay informed about the most recent best practices for pattern matching of alphabetic characters in multiple languages. Thus I have returned and am seeking any enlightenment you may be able to provide.
In English, one can say:
if (/^(\w+)$/) { print "found [$1]\n"; }
or if you don't want underscores:
if (/^([A-Za-z]+)$/) { print "found [$1]\n"; }
Then I seem to recall this is supported, but maybe not on older Perls (not a problem; I have a newish Perl):
if (/^([[:alpha:]]+)$/) { print "found [$1]\n"; }
I'm sure it's a FAQ, but I'm looking for the latest up-to-date best practices on this FAQ. The question is: will the above work for any alphabetical language? Or does it only work for the language of my current locale setting (whatever that is - I've been fuzzy on that ever since learning that one legal setting for locale is 'C' -- odd).
Or should I hand-construct regular expressions for each language using a list of characters from that language?
The data I'm working with will all be UTF-8, if that makes a difference.
In reply to Modern best practices for multilingual regexp alphabetical character matching? by dmorgo
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |