Unicode operations

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi
I use the following to match swedish words -

$sentence =~ /([A-Z\N{LATIN CAPITAL LETTER A WITH RING ABOVE}\N{LATIN 
+CAPITAL LETTER A WITH DIAERESIS}\N{LATIN CAPITAL LETTER O WITH DIAERE
+SIS}\N{LATIN CAPITAL LETTER E WITH ACUTE}]+)/ig)
[download]

Is this efficient?
Also, how can I do this for swedish words? -

my $word = ucfirst(lc($word));
[download]

Thanks!!

Comment on Unicode operations Select or Download Code

Replies are listed 'Best First'.
Re: Unicode operations by ikegami (Patriarch) on Jan 03, 2010 at 21:22 UTC
Is this efficient? When compared with what? Also, how can I do [`ucfirst(lc($word))`] for swedish words? It should work as-is for Swedish words. `use open ':std', ':locale'; use charnames ':full'; my $word = "\N{LATIN CAPITAL LETTER A WITH RING ABOVE}" . "\N{LATIN CAPITAL LETTER A WITH DIAERESIS}" . "\N{LATIN CAPITAL LETTER O WITH DIAERESIS}" . "\N{LATIN CAPITAL LETTER E WITH ACUTE}"; print($word, "\n"); print(ucfirst(lc($word)), "\n");` [download] ÅÄÖÉ Åäöé Of course, if the words are coming to you encoded (i.e. from a file handle), you need to decode them first. You probably won't run into this problem, but if the characters in the range U+0080..U+00FF are left unchanged, precede the expression with `utf8::upgrade( $word );` [download] That bug will be fixed in 5.12 (although it might require `use 5.012;`).	[reply] [d/l] [select]
Re: Unicode operations by Khen1950fx (Canon) on Jan 03, 2010 at 20:46 UTC
Efficient? Try running the script like this: `perl -MDevel::SimpleTrace sentence.pl` [download] See Devel::SimpleTrace.	[reply] [d/l]