http://qs1969.pair.com?node_id=735804

dmorgo has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I slipped into a deep meditation for the last several months, and neglected to stay informed about the most recent best practices for pattern matching of alphabetic characters in multiple languages. Thus I have returned and am seeking any enlightenment you may be able to provide.

In English, one can say:

if (/^(\w+)$/) { print "found [$1]\n"; }

or if you don't want underscores:

if (/^([A-Za-z]+)$/) { print "found [$1]\n"; }

Then I seem to recall this is supported, but maybe not on older Perls (not a problem; I have a newish Perl):

if (/^([[:alpha:]]+)$/) { print "found [$1]\n"; }

I'm sure it's a FAQ, but I'm looking for the latest up-to-date best practices on this FAQ. The question is: will the above work for any alphabetical language? Or does it only work for the language of my current locale setting (whatever that is - I've been fuzzy on that ever since learning that one legal setting for locale is 'C' -- odd).

Or should I hand-construct regular expressions for each language using a list of characters from that language?

The data I'm working with will all be UTF-8, if that makes a difference.